Statistics for Linguists: An Introduction Using R is the first statistics textbook on linear models for linguistics. The book covers simple uses of linear models through generalized models to more advanced approaches, maintaining its focus on conceptual issues and avoiding excessive mathematical details. It contains many applied examples using the R statistical programming envi...
Statistics for Linguists: An Introduction Using R is the first statistics textbook on linear models for linguistics. The book covers simple uses of linear models through generalized models to more advanced approaches, maintaining its focus on conceptual issues and avoiding excessive mathematical details. It contains many applied examples using the R statistical programming environment. Written in an accessible tone and style, this text is the ideal main resource for graduate and advanced undergraduate students of Linguistics statistics courses as well as those in other fields, including Psychology, Cognitive Science, and Data Science.
作者简介
· · · · · ·
Bodo Winter is Lecturer in Cognitive Linguistics in the Department of English Language and Applied Linguistics at the University of Birmingham, UK.
目录
· · · · · ·
0. Preface: Approach and how to use this book
0.1. Strategy of the book
0.2. Why R?
0.3. Why the tidyverse?
0.4. R packages required for this book
0.5. What this book is not
· · · · · ·
(更多)
0. Preface: Approach and how to use this book
0.1. Strategy of the book
0.2. Why R?
0.3. Why the tidyverse?
0.4. R packages required for this book
0.5. What this book is not
0.6. How to use this book
0.7. Information for teachers
1. Introduction to base R
1.1. Introduction
1.2. Baby steps: simple math with R
1.3. Your first R script
1.4. Assigning variables
1.5. Numeric vectors
1.6. Indexing
1.7. Logical vectors
1.8. Character vectors
1.9. Factor vectors
1.10. Data frames
1.11. Loading in files
1.12. Plotting
1.13. Installing, loading, and citing packages
1.14. Seeking help
1.15. A note on keyboard shortcuts
1.16. Your R journey: The road ahead
2. Tidy functions and reproducible R workflows
2.1. Introduction
2.2. tibble and readr
2.3. dplyr
2.4. ggplot2
2.5. Piping with magrittr
2.6. A more extensive example: iconicity and the senses
2.7. R markdown
2.8. Folder structure for analysis projects
2.9. Readme files and more markdown
2.10. Open and reproducible research
3. Models and distributions
3.1. Models
3.2. Distributions
3.3. The normal distribution
3.4. Thinking of the mean as a model
3.5. Other summary statistics: median and range
3.6. Boxplots and the interquartile range
3.7. Summary statistics in R
3.8. Exploring the emotional valence ratings
3.9. Chapter conclusions
4. Introduction to the linear model: Simple linear regression
4.1. Word frequency effects
4.2. Intercepts and slopes
4.3. Fitted values and residuals
4.4. Assumptions: Normality and constant variance
4.5. Measuring model fit with
4.6. A simple linear model in R
4.7. Linear models with tidyverse functions
4.8. Model formula notation: Intercept placeholders
4.9. Chapter conclusions
5. Correlation, linear, and nonlinear transformations
5.1. Centering
5.2. Standardizing
5.3. Correlation
5.4. Using logarithms to describe magnitudes
5.5. Example: Response durations and word frequency
5.6. Centering and standardization in R
5.7. Terminological note on the term ‘normalizing’
5.8. Chapter conclusions
6. Multiple regression
6.1. Regression with more than one predictor
6.2. Multiple regression with standardized coefficients
6.3. Assessing assumptions
6.4. Collinearity
6.5. Adjusted
6.6. Chapter conclusions
7. Categorical predictors
7.1. Introduction
7.2. Modeling the emotional valence of taste and smell words
7.3. Processing the taste and smell data
7.4. Treatment coding in R
7.5. Doing dummy coding ‘by hand’
7.6. Changing the reference level
7.7. Sum coding in R
7.8. Categorical predictors with more than two levels
7.9. Assumptions again
7.10. Other coding schemes
7.11. Chapter conclusions
8. Interactions and nonlinear effects
8.1. Introduction
8.2. Categorical * continuous interactions
8.3. Categorical * categorical interactions
8.4. Continuous * continuous interactions
8.5. Continuous interactions and regression planes
8.6. Higher-order interactions
8.7. Chapter conclusions
9. Inferential statistics 1: Significance testing
9.1. Introduction
9.2. Effect size: Cohen’s
9.3. Cohen’s in R
9.4. Standard errors and confidence intervals
9.5. Null hypotheses
9.6. Using to measure the incompatibility with the null hypothesis
9.7. Using the -distribution to compute -values
9.8. Chapter conclusions
10. Inferential statistics 2: Issues in significance testing
10.1. Common misinterpretations of -values
10.2. Statistical power and Type I, II, M, and S errors
10.3. Multiple testing
10.4. Stopping rules
10.5. Chapter conclusions
11. Inferential statistics 3: Significance testing in a regression context
11.1. Introduction
11.2. Standard errors and confidence intervals for regression coefficients
11.3. Significance tests with multi-level categorical predictors
11.4. Another example: the absolute valence of taste and smell words
11.5. Communicating uncertainty for categorical predictors
11.6. Communicating uncertainty for continuous predictors
11.7. Chapter conclusions
12. Generalized linear models: Logistic regression
12.1. Motivating generalized linear models
12.2. Theoretical background: Data-generating processes
12.3. The log odd function and interpreting logits
12.4. Speech errors and blood alcohol concentration
12.5. Predicting the dative alternation
12.6. Analyzing gesture perception: Hassemer & Winter (2016)
12.6.1. Exploring the dataset
12.6.2. Logistic regression analysis
12.7. Chapter conclusions
13. Generalized linear models 2: Poisson regression
13.1. Motivating Poisson regression
13.2. The Poisson distribution
13.3. Analyzing linguistic diversity using Poisson regression
13.4. Adding exposure variables
13.5. Negative binomial regression for overdispersed count data
13.6. Overview and summary of the generalized linear model framework
13.7. Chapter conclusions
14. Mixed models 1: Conceptual introduction
14.1. Introduction
14.2. The independence assumption
14.3. Dealing with non-independence via experimental design and averaging
14.4. Mixed models: Varying intercepts and varying slopes
14.5. More on varying intercepts and varying slopes
14.6. Interpreting random effects and random effect correlations
14.7. Specifying mixed effects models: lme4 syntax
14.8. Reasoning about your mixed model: The importance of varying slopes
14.9. Chapter conclusions
15. Mixed models 2: Extended example, significance testing, convergence issues
15.1. Introduction
15.2. Simulating vowel durations for a mixed model analysis
15.3. Analyzing the simulated vowel durations with mixed models
15.4. Extracting information out of lme4 objects
15.5. Messing up the model
15.6. Likelihood ratio tests
15.7. Remaining issues
15.7.1. -squared for mixed models
15.7.2. Predictions from mixed models
15.7.3. Convergence issues
15.8. Mixed logistic regression: Ugly selfies
15.9. Shrinkage and individual differences
15.10. Chapter conclusions
16. Outlook and strategies for model building
16.1. What you have learned so far
16.2. Model choice
16.3. The cookbook approach
16.4. Stepwise regression
16.5. A plea for subjective and theory-driven statistical modeling
16.6. Reproducible research
16.7. Closing words
References
Appendix A. Correspondences between significance tests and linear models
Appendix B. Reading recommendations
· · · · · · (收起)
Barr, D.J., Levy, R., Scheepers, C., & Tily, J.J. (2013). Random-effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68, 255–278. https://doi.org/10.1016/j.jml.2012.11.001 对语言学界注意到混合模型...
(展开)
Statistics for Linguists: An Introduction Using R (2020) by Bono Wintertakes a unique approach towards introducing statistics of linear models for linguistics, in that it builds model-based thinking instead of test-based thinking. Winters explains that he t...
(展开)
1 有用 Li 2020-02-25 04:45:52
著名的bodo winter写的统计教材。基本思路就是抛弃各种统计测试,一套线性回归走天下。感觉这是language science的潮流,基本大家都抛弃各种anova之类的,全用mixed effects model了
0 有用 滄浪水 2021-04-22 13:22:22
个人觉得此书在方法论上的讨论比方法的传授更有价值。具体方法的操作引导还算详细。虽然作者提供了代码,读的时候还是自己手敲一遍为好,毕竟原文代码略有疏忽,敲过之后才能感受到。
0 有用 自动化丝绒轮盘 2023-02-24 04:26:03 德国
排版密度之高及其单一性和美国风格的那种教材可以说是截然不同,倒是有点像之前看的那个语言学python,反正就全是字…
0 有用 momo 2020-05-24 23:16:18
很实用!
0 有用 泥豆尼痘昵 2020-11-15 10:26:21
内容质量其实只值三星评价,一是谈不上同类书中最佳,二是作为入门用书也谈不上什么独特见解,作为合格的研究者终究还是要把R、多变量线性回归、GLM、Mixed Models这些主题的其它书都看一遍,那为什么一开始要费时间读这本呢?最终给四星是因为附录B的两页读物推荐非常有用,外加书里还提了一嘴R Markdown,这在其它同类项书中很少见,值得鼓励。书名中的linguists一词源于书中的例子都是来自... 内容质量其实只值三星评价,一是谈不上同类书中最佳,二是作为入门用书也谈不上什么独特见解,作为合格的研究者终究还是要把R、多变量线性回归、GLM、Mixed Models这些主题的其它书都看一遍,那为什么一开始要费时间读这本呢?最终给四星是因为附录B的两页读物推荐非常有用,外加书里还提了一嘴R Markdown,这在其它同类项书中很少见,值得鼓励。书名中的linguists一词源于书中的例子都是来自该领域,作者也会就该领域给出一些评论,比如大家都去无脑keep it maximal了,有那么一丝趣味性。总体而言,如果有时间可以翻阅一下此书,有助于重温一些最为基本的知识点,但在具体项目中需要寻找建议或问题解决方式时还是要去参阅更为深入的资料,如果时间紧就直接去读各个主题中顶尖的书,而不是这一本。 (展开)
0 有用 closedloop 2023-08-14 13:49:28 四川
很庆幸我的统计学入门由此书开始。非常容易上手,代码和解释非常通俗易懂,尤其是对基础概念的理解非常友好,完全零基础的我也能层层递进,步入线性回归的大门。
0 有用 自动化丝绒轮盘 2023-02-24 04:26:03 德国
排版密度之高及其单一性和美国风格的那种教材可以说是截然不同,倒是有点像之前看的那个语言学python,反正就全是字…
0 有用 Seamus🌈 2022-10-11 03:29:57 英国
文科生福音🥹
0 有用 zimine 2022-03-01 07:13:07
not bad as an intro. lx160 teaching material.
0 有用 滄浪水 2021-04-22 13:22:22
个人觉得此书在方法论上的讨论比方法的传授更有价值。具体方法的操作引导还算详细。虽然作者提供了代码,读的时候还是自己手敲一遍为好,毕竟原文代码略有疏忽,敲过之后才能感受到。