Article published In: Journal of Pidgin and Creole Languages: Online-First Articles
Measuring the similarity between languages
The case of creoles and non-creoles
Published online: 26 May 2025
https://doi.org/10.1075/jpcl.24026.pla
https://doi.org/10.1075/jpcl.24026.pla
Abstract
In typology, statistical methods have been successfully used to assess similarities and differences between
languages. In creole studies, the use of quantitative methods has been discussed controversially. In the debate many
methodological aspects of the statistical models used have been criticized (e.g. Meakins, Felicity. 2022. Empiricism
or imperialism: The science of creole exceptionalism. Journal of Pidgin and Creole
Languages 37(1). 189–203. ; Bakker, Peter. 2023. Empiricism
against imperialism: Science, dogma and the neocolonial heritage of creole studies. Reflections on Meakins
(2022). Journal of Pidgin and Creole Languages. ). This paper presents an investigation of two
methodological problems that have not been critically looked at so far: the question of which statistical models produce which
results, and the question of how the amount of missing values in data sets influences the results. We present a study in which we
tested different statistical models on 21 features from two arbitrarily chosen domains (‘Word Order’ and ‘Nominal Categories’)
from the WALS (Dryer, Matthew S. & Martin Haspelmath. 2013. WALS
Online (v2020.3). Zenodo ) and APiCS (Michaelis, Susanne Maria, Philippe Maurer, Martin Haspelmath & Magnus Huber (eds.). 2013. The
atlas of pidgin and creole language structures. Oxford University Press, USA.) data bases. It is demonstrated that different statistical methods yield similar
results, and that different sample sizes do not dramatically influence the model outcomes.
Keywords: typology, similiarity, creole, non-creole, phylogenetic network, statistical modeling, APiCS, WALS
Article outline
- 1.Introduction
- 2.Quantitative typology: Merits and pitfalls
- 2.1Why compare (or not compare)?
- 2.2Data
- 2.3Statistical tools
- 2.4The present study
- 3.Methodology
- 3.1Data
- 3.2Coding
- 3.3Sampling
- 3.4Statistical modeling
- 4.Results 1: Clustering
- 4.1Phylogenetic networks
- 4.1.1Neighbor-joining vs neighbor net networks
- 4.1.2Neighbor-joining networks
- 4.1.3Hierarchical cluster analysis
- 4.1Phylogenetic networks
- 5.Results 2: Predicting the type of language
- 5.1Recursive partitioning and regression trees
- 5.2Random forests
- 6.Summary and discussion
- Acknowledgements
- Notes
References
References (49)
Baayen, Harald. 2008. Analyzing
linguistic data. A practical introduction to
statistics. Cambridge: Cambridge University Press.
Baker, Philip. 1990. Off
target? Journal of Pidgin and Creole
Languages 5(1). 107–119.
Bakker, Peter. 2023. Empiricism
against imperialism: Science, dogma and the neocolonial heritage of creole studies. Reflections on Meakins
(2022). Journal of Pidgin and Creole Languages.
Bakker, Peter, Finn Borchsenius, Carsten Levisen & Eeva M. Sippola. 2017. Creole
studies: Phylogenetic approaches. John Benjamins Publishing Company.
Bakker, Peter, Aymeric Daval-Markussen, Mikael Parkvall & Ingo Plag. 2011. Creoles
are typologically distinct from non-creoles. Journal of Pidgin and Creole
Languages 26(1). 5–42.
Bickel, Balthasar. 2007. Typology
in the 21st century: Major current developments. Linguistic
Typology 11(1). 239–251.
Blasi, Damián E., Susanne Maria Michaelis & Martin Haspelmath. 2017. Grammars
are robustly transmitted even during the emergence of creole languages. Nature Human
Behaviour 1(10). 723–729.
Bouckaert, Remco, Philippe Lemey, Michael Dunn, Simon J. Greenhill, Alexander V. Alekseyenko, Alexei J. Drummond, Russell D. Gray, Marc A. Suchard & Quentin D. Atkinson. 2012. Mapping
the origins and expansion of the Indo-European language
family. Science 337(6097). 957–960.
Cysouw, Michael. 2008. Using
the World Atlas of Language Structures. Language Typology and
Universals 61(3). 181–185.
Daval-Markussen, Aymeric. 2019. Reconstructing
creole. Aarhus: Aarhus University Phd dissertation.
Dunn, Michael, Angela Terrill, Ger Reesink, Robert A. Foley & Stephen C. Levinson. 2005. Structural
phylogenetics and the reconstruction of ancient language
history. Science 309(5743). 2072–2075.
Efron, Bradley. 1983. Estimating
the error rate of a prediction rule: Improvement on cross-validation. Journal of the American
Statistical
Association 78(382). 316–331.
Friedman, Susan Stanford. 2013. Why not
compare? In Rita Felski & Susan Stanford Friedman (eds.), Comparison:
Theories, approaches, uses, 35–45. Johns Hopkins University Press.
Gorman, Ben. 2016. mltools:
Machine Learning Tools. . Institution: Comprehensive R Archive Network Pages:
0.3.5. URL [URL]
Guzmán Naranjo, Matías & Laura Becker. 2022. Statistical
bias control in typology. Linguistic
Typology 26(3). 605–670.
Haspelmath, Martin. 2010. Comparative
concepts and descriptive categories in crosslinguistic
studies. Language 86(3). 663–687.
Jaeger, T. Florian, Peter Graff, William Croft & Daniel Pontillo. 2011. Mixed
effect models for genetic and areal dependencies in linguistic typology. Linguistic
Typology 15(2011). 281–320.
Kuhn & Max. 2008. Building
predictive models in r using the caret package. Journal of Statistical
Software 28(5). 1–26. . URL [URL]
Lander, Yury & Peter Arkadiev. 2016. On
the right of being a comparative concept. Linguistic
Typology 20(2). 403–416.
Lefebvre, Claire. 1998. Creole
genesis and the acquisition of grammar: The case of haitian
creole. Cambridge: Cambridge University Press.
Levy, Dan & Lior Pachter. 2011. The
neighbor-net algorithm. Advances in Applied
Mathematics 47(2). 240–258.
Lindstromberg, Seth. 2022. P-curving
as a safeguard against p-hacking in SLA research: A case study. Studies in Second Language
Acquisition 44(4). 1155–1180.
List, Johann-Mattis. 2021. Computer-assisted
approaches to historical language
comparison. Jena: Friedrich-Schiller-Universität Jena, Philosophische Fakultät Habilitation Thesis.
Maechler, Martin, Peter Rousseeuw, Anja Struyf, Mia Hubert & Kurt Hornik. 2025. cluster:
Cluster analysis basics and extensions. R package version
2.1.8.1. URL [URL]
McWhorter, John H. 1998. Identifying the creole
prototype: Vindicating a typological
class. Language 74(4). 788–818.
Meakins, Felicity. 2022. Empiricism
or imperialism: The science of creole exceptionalism. Journal of Pidgin and Creole
Languages 37(1). 189–203.
Michaelis, Susanne Maria, Philippe Maurer, Martin Haspelmath & Magnus Huber (eds.). 2013. The
atlas of pidgin and creole language structures. Oxford University Press, USA.
Murawaki, Yugo. 2016. Statistical
modeling of creole genesis. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human language
technologies, 1329–1339.
Muysken, Pieter. 1988. Are
creoles a special type of language? In Frederick J. Editor Newmeyer (ed.), Linguistics:
The cambridge survey, 285–301. Cambridge University Press.
Paradis, Emmanuel & Klaus Schliep. 2019. ape
5.0: an environment for modern phylogenetics and evolutionary analyses in
R. Bioinformatics 35(3). 526–528.
Parkvall, Mikael. 2008. The
simplicity of creoles in a cross-linguistic perspective. In Matti Miestamo, Fred Karlsson & Kaius Sinnemäki (eds.), Language
complexity. typology, contact,
change, 265–285. John Benjamins Publishing Company.
Plag, Ingo. 2008. Creoles
as interlanguages: Inflectional morphology. Journal of Pidgin and Creole
Languages 23(1). 114–135.
. 2011. Creolization
and admixture: Typology, feature pools, and second language acquisition. Journal of Pidgin and
Creole
Languages 26(1). 89–110.
R Core Team. 2021. R: A language and
environment for statistical computing. URL [URL]
Radhakrishnan, R. 2013. Why
compare? In Rita Felski & Susan Stanford Friedman (eds.), Comparison:
Theories, approaches, uses, 15–33. Johns Hopkins University Press.
Roettger, Timo B. 2019. Researcher degrees of freedom in
phonetic research. Laboratory
Phonology 10(1).
Schliep, Klaus, Alastair J. Potts, David A. Morrison & Guido W. Grimm. 2017. Intertwining
phylogenetic trees and networks. Methods in Ecology and
Evolution 41. 1212–1220.
Skirgård, Hedvig, Hannah J. Haynie, Damián E. Blasi, Harald Hammarström, Jeremy Collins, Jay J. Latarche, Jakob Lesage, Tobias Weber, Alena Witzlack-Makarevich, Sam Passmore, Angela Chira, Luke Maurits, Russell Dinnage, Michael Dunn, Ger Reesink, Ruth Singer, Claire Bowern, Patience Epps, Jane Hill, Outi Vesakoski, Martine Robbeets, Noor Karolin Abbas, Daniel Auer, Nancy A. Bakker, Giulia Barbos, Robert D. Borges, Swintha Danielsen, Luise Dorenbusch, Ella Dorn, John Elliott, Giada Falcone, Jana Fischer, Yustinus Ghanggo Ate, Hannah Gibson, Hans-Philipp Göbel, Jemima A. Goodall, Victoria Gruner, Andrew Harvey, Rebekah Hayes, Leonard Heer, Roberto E. Herrera Miranda, Nataliia Hübler, Biu Huntington-Rainey, Jessica K. Ivani, Marilen Johns, Erika Just, Eri Kashima, Carolina Kipf, Janina V. Klingenberg, Nikita König, Aikaterina Koti, Richard G. A. Kowalik, Olga Krasnoukhova, Nora L. M. Lindvall, Mandy Lorenzen, Hannah Lutzenberger, Tônia R. A. Martins, Celia Mata German, Suzanne van der Meer, Jaime Montoya Samamé, Michael Müller, Saliha Muradoğlu, Kelsey Neely, Johanna Nickel, Miina Norvik, Cheryl Akinyi Oluoch, Jesse Peacock, India O. C. Pearey, Naomi Peck, Stephanie Petit, Sören Pieper, Mariana Poblete, Daniel Prestipino, Linda Raabe, Amna Raja, Janis Reimringer, Sydney C. Rey, Julia Rizaew, Eloisa Ruppert, Kim K. Salmon, Jill Sammet, Rhiannon Schembri, Lars Schlabbach, Frederick W. P. Schmidt, Amalia Skilton, Wikaliler Daniel Smith, Hilário de Sousa, Kristin Sverredal, Daniel Valle, Javier Vera, Judith Voß, Tim Witte, Henry Wu, Stephanie Yam, Jingting Ye, Maisie Yong, Tessa Yuditha, Roberto Zariquiey, Robert Forkel, Nicholas Evans, Stephen C. Levinson, Martin Haspelmath, Simon J. Greenhill, Quentin D. Atkinson & Russell D. Gray. 2023. Grambank
reveals the importance of genealogical constraints on linguistic diversity and highlights the impact of language
loss. Science
Advances 9(16).
Stephen Milborrow. 2011. rpart.plot. URL [URL]
Torsten Hothorn & Achim Zeileis. 2009. partykit:
A toolkit for recursive partytioning. URL [URL]
Torsten Hothorn, Kurt Hornik, Carolin Strobl & Achim Zeileis. 2009. party:
A laboratory for recursive partytioning. URL [URL]