In:Corpus Methods for Semantics: Quantitative studies in polysemy and synonymy
Edited by Dylan Glynn and Justyna A. Robinson
[Human Cognitive Processing 43] 2014
► pp. 487–533
Logistic regression
A confirmatory technique for comparisons in corpus linguistics
Published online: 6 November 2014
https://doi.org/10.1075/hcp.43.18spe
https://doi.org/10.1075/hcp.43.18spe
This text offers an introduction to binary logistic regression, a confirmatory technique for statistically modelling the effect of one or several predictors on a binary response variable. It is explained why logistic regression is exceptionally well suited for the comparison of near-synonyms in corpus data; the technique allows the researcher to identify the different factors that have an impact on the choice between near synonyms, and to tease apart their respective effects. Moreover, the technique is well suited to deal with the type of unbalanced data sets that are typical of Corpus Linguistics. First, we describe in which contexts logistic regression is applicable and we give examples of the types of research questions for which it is an appropriate tool. Next, we explain why and how logistic regression analysis is different from linear regression analysis and we illustrate how the output of logistic regression analysis can be interpreted, using the study of an alternation pattern in Dutch as our example. The R code used in the case study is explained in detail and an URL is given from which R code and data sets can be downloaded. Finally, suggestions for further reading are given.
References (20)
Arnold, J., Wasow, Th., Losongco, A., & Ginstrom, R. (2000). Heaviness vs. newness: The effects of complexity and information structure on constituent ordering.
Language
, 76, 28–55.
Berkson, J. (1944). Application of the logistic function to bio-assay.
Journal of the American Statistical Association
, 39, 357–365.
Cedergren, H., & Sankoff, D. (1974). Variable rules: Performance as a statistical reflection of competence.
Language
, 50, 33–56.
Fox, J. (2003). Effect displays in R for generalised linear models.
Journal of Statistical Software
, 8(15), 1–27. Retrieved from [URL].
Grondelaers, S., Speelman, D., & Geeraerts, D. (2002). Regressing on er. Statistical analysis of texts and language variation. In A. Morin, & P. Sébillot (Eds.),
6èmes journées internationales d’analyse statistique des données textuelles
(pp. 335–346). Rennes: Institut National de Recherche en Informatique et en Automatique.
Harrell, F.E. (2001).
Regression modeling strategies: With applications to linear models, logistic regression, and survival analysis
. Berlin: Springer.
Johnson, D.E. (2008). Getting off the GoldVarb standard: Introducing Rbrul for mixed-effects variable rule analysis.
Language and Linguistics Compass
, 3, 359–83.
Keune, K., Ernestus, M., van Hout, R., & Baayen, H. (2005). Social, geographical, and register variation in Dutch: From written mogelijk to spoken mok
.
Corpus Linguistics and Linguistic Theory
, 1, 183–223.
Nelder, J., & Wedderburn, R. (1972). Generalized linear models.
Journal of the Royal Statistical Society: Series A
, 135, 370–384.
Oostdijk, N. (2000). The spoken Dutch corpus: Overview and first evaluation. In S. Markantontou, S. Piperidis, & G. Stainhauoer (Eds.), Proceedings of the second international conference on language resources and evaluation (pp. 887–893). Athens: Institute for Language and Speech Processing.
Paolillo, J. (2002).
Analyzing linguistic variation: Statistical models and methods
. Stanford: CSLI.
Sankoff, D. (1988). Variable rules. In U. Ammon, N. Dittmar, & K.J. Mathheier (Eds.),
Berlin sociolinguistics: An international handbook of the science of language and society
, Vol. 2.(pp. 984–997). Berlin & New York: Walter de Gruyter.
Sankoff, D., Tagliamonte, S., & Smith, E. (2005).
Goldvarb X: A variable rule application for Macintosh and Windows
. Department of Linguistics, University of Toronto.
Tagliamonte, S.A. (2006).
Analysing sociolinguistic variation
. Cambridge: Cambridge University Press.
Cited by (50)
Cited by 50 other publications
Guan, Lei, Enqin Liu, Man Yang & Bing Gao
Kaya, Muhammed-Fatih & Mareike Schoop
Li, Yi
Shahsavar, Yeganeh, Avishek Choudhury & Justus Onu
Sinap, Vahid
Wang, Xiaosong, Haisong Feng, Yilei Zhang & Fan Lin
A. S., Anurag & M. Johnpaul
Eva-Marie Bloom Ström, Hannah Gibson, Rozenn Guérois & Lutz Marten
Glynn, Dylan & Olaf Mikkelsen
Marine, Buzuneh & Dagne Mengistie
Rathi, Snehal Rahul, Narendra Jadhav, Abhishek Raut, Abhishek Navhal & Manas Patil
Redelinghuys, Karien
2024. Language contact and change through translation in Afrikaans and South African English. In Constraints on Language Variation and Change in Complex Multilingual Contact Settings [Contact Language Library, 60], ► pp. 58 ff.
Silva, Douglas, Nadia Felix & Sergio Carvalho
Yan, Fangke, Shuangbing Wen, Chengwei Liao, Jun Li & Tao Hu
Babu, C. Ganesh, M. Gowri Shankar, G. S. Priyanka & B. Vidhya
Davey, Kira & Danielle Barth
Mummadi, Swathi, Tharun A, Divija Chigullapally, Aravind Bommena & Akhila D
Oyebola, Folajimi & Warsa Melles
2023. Question intonation patterns in Nigerian English. In New Englishes, New Methods [Varieties of English Around the World, G68], ► pp. 108 ff.
Rajaguru, Harikumar, M. Gowri Shankar, S. Mohammed Irfan & C. Mukesh Balaji
Rajaguru, Harikumar, M. Gowri Shankar, S. P. Nanthakumar & I. Arul Murugan
Rajaguru, Harikumar, M. Gowri Shankar, S. P. Nanthakumar & I. Arul Murugan
Romine, Samuel, Joshua Jensen & Robert Ball
SUGAWARA, Yuki & Kazuho KAMBARA
Ferreira, Tiago S., Ewaldo E. C. Santana, Antônio F. L. Jacob Junior, Paulo F. Silva Junior, Luciana S. Bastos, Ana L. A. Silva, Solange A. Melo, Carlos A. M. Cruz, Vivianne S. Aquino, Luís S. O. Castro, Guilherme O. Lima & Raimundo C. S. Freire
Hirota, Harunobu
Jiahuai Ma, Kaixian Xu, Yu Qiao & Zhaoyan Zhang
Krawczak, Karolina
2022. Modeling constructional variation. In Analogy and Contrast in Language [Human Cognitive Processing, 73], ► pp. 341 ff.
Ma, Guanghui, Rajendran Parthiban & Nemai Karmakar
Nguyen, Allison, Tom Roberts, Pranav Anand & Jean E Fox Tree
Pijpops, Dirk
Pijpops, Dirk, Dirk Speelman & Antal van den Bosch
Silva, Douglas, Sergio T. Carvalho & Nadia Silva
TIZÓN-COUTO, DAVID
Heng, Tianyu, Dezhi Yang, Ruonan Wang, Li Zhang, Yang Lu & Guanhua Du
Pijpops, Dirk, Dirk Speelman, Freek Van de Velde & Stefan Grondelaers
Podhorodecka, Joanna
Tizón-Couto, David & David Lorenz
Comer, Marie
Franco, Karlien & Sali A. Tagliamonte
Franco, Karlien & Sali A. Tagliamonte
De Smet, Isabeau & Freek Van de Velde
2019. Reassessing the evolution of West Germanic preterite inflection. Diachronica 36:2 ► pp. 139 ff.
PIJPOPS, DIRK, DIRK SPEELMAN, STEFAN GRONDELAERS & FREEK VAN DE VELDE
Claes, Jeroen
2017. Cognitive and geographic constraints on morphosyntactic variation. Belgian Journal of Linguistics 31 ► pp. 30 ff.
Donaldson, Bryan
Donaldson, Bryan
2020. Clitic position in Old Occitan affirmative verb-first declaratives coordinated bye. Journal of Historical Linguistics 10:3 ► pp. 389 ff.
Granvik, Anton
2017. Accounting for syntactic variation in diachrony. Belgian Journal of Linguistics 31 ► pp. 243 ff.
Chambaz, Antoine & Guillaume Desagulier
FONTEYN, LAUREN & NIKKI VAN DE POL
Pijpops, Dirk & Freek Van de Velde
[no author supplied]
This list is based on CrossRef data as of 10 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
