Article published In: Demystifying New Methods in Historical Linguistics
Edited by Erich Round
[Diachronica 41:3] 2024
► pp. 330–354
An agent-based modelling approach to wave-like diversification of language families
Published online: 2 July 2024
https://doi.org/10.1075/dia.23010.har
https://doi.org/10.1075/dia.23010.har
Abstract
In contrast to phylogenetic tree inference, wave model approaches are often regarded as difficult to
computationally implement for inference of language relatedness. This paper proposes a basic framework for the computational
modelling of wave-like diversification in language families and explains the model type of agent-based models for linguistic data.
The approach is based on agent-based simulations which allow for the detailed simulation of speaker interactions within speech
communities. The proposed framework operates by simulating a large number of possible diversification situations with different
parameter settings and selecting those runs that yield a good fit to the linguistic data of the geographical spread of different
languages. The model can be fed with the geographical extent of languages and their known innovations in order to computationally
reconstruct the most likely diversification scenarios of these languages under the wave model.
Résumé
Contrairement à l’inférence bayésienne de la phylogénie, les approches fondées sur la théorie des
vagues sont souvent considérées comme difficiles à mettre en oeuvre sur le plan informatique afin d’analyser
l’inférence de parenté généalogique de langues. Cet article propose un cadre de base pour la modélisation informatique de
la diversification linguistique par vagues et explique le type de modèle basé sur l’agent (Agent-based-models) pour les
données linguistiques. L’approche repose sur des simulations basées sur des agents permettant de simuler en détail les
interactions entre les locuteurs au sein des communautés linguistiques. Le cadre proposé fonctionne en simulant un grand nombre de
situations de diversification possibles avec différents paramètres et en sélectionnant celles qui correspondent le mieux aux
données linguistiques relatives à la répartition géographique des différentes langues. Le modèle peut être alimenté par
l’étendue géographique des langues et leurs innovations connues afin de reconstruire par calcul les scénarios de
diversification les plus probables de ces langues dans le cadre de la théorie des vagues.
Zusammenfassung
Im Gegensatz zur Inferenz phylogenetischer Stammbäume werden Wellenmodelle für die Bestimmung von
Sprachverwandtschaften oft als schwer implementierbar mit computergestützten Methoden angesehen. Dieser Artikel schlägt
ein Grundgerüst für die computergestützte Modellierung wellenförmiger Sprachdiversifikation vor und beschreibt den Modelltyp der
agentenbasierten Modelle (Agent-based models) für linguistische Daten. Der Ansatz basiert auf Computersimulationen, die eine
detaillierte Simulation von Sprecherinteraktionen innerhalb von Sprachgemeinschaften ermöglichen. Das vorgeschlagene
Modell simuliert eine große Anzahl möglicher Diversifizierungsszenarien mit unterschiedlichen Parametereinstellungen und wählt
diejenigen aus, die eine gute Übereinstimmung mit den linguistischen Daten zur geografischen Ausbreitung der betreffenden
Sprachfamilie aufweisen. Das Modell kann mit der geografischen Ausdehnung von Sprachen und ihren bekannten Innovationen gespeist
werden, um die wahrscheinlichsten Diversifikationsszenarien dieser Sprachen im Rahmen des Wellenmodells computergestützt zu
rekonstruieren.
Article outline
- 1.Introduction
- 2.The model
- 2.1The test case setup
- 2.2Model architecture
- 2.2.1The modules of the ABM
- 2.2.2Running and evaluation
- 2.2.3A note on sampling algorithms for ABMs
- 2.2.4Model selection and prior choices
- 3.Results
- 3.1Model comparison
- 3.2Innovation reconstruction
- 3.3Parameter inference
- 4.Discussion and outlook
- 5.Conclusion
- Acknowledgements
- Notes
- Abbreviations
References
References (39)
Agee, Joshua. 2018. A
glottometric subgrouping of the early Germanic languages. San Jose, CA: San Jose State University MA thesis.
Bhavnani, Ravi, Karsten Donnay, Dan Miodownik, Maayan Mor & Dirk Helbing. 2014. Group
segregation and urban violence. American Journal of Political
Science 58(1). 226–245.
Bouckaert, Remco R., Claire Bowern & Quentin Atkinson. 2018. The
origin and expansion of Pama-Nyungan languages across Australia. Nature Ecology &
Evolution 2(4). 741–749.
Bowern, Claire. 1998. The
case of Proto-Karnic. Canberra: Australian National University Honour’s thesis.
Chang, Will, Chundra Cathcart, David Hall & Andrew Garrett. 2015. Ancestry-constrained
phylogenetic analysis supports the Indo-European steppe
hypothesis. Language 91(1). 194–244.
Currie, Thomas E., Andrew Meade, Myrtille Guillon & Ruth Mace. 2013. Cultural
phylo-geography of the Bantu languages of sub-Saharan Africa. Proceedings of the Royal Society
B: Biological Sciences 280(1762).
François, Alexandre. 2015. Trees,
waves and linkages: Models of language diversification. In Claire Bowern & Bethwyn Evans (eds.), The
Routledge handbook of historical linguistics (Routledge Handbooks in
Linguistics), 161–189. New York: Routledge.
Gavin, Michael, Carlos Botero, Claire Bowern, Robert Colwell, Michael Dunn, Robert Dunn, Russell Gray, inter alia & Yanega Gregor. 2013. Toward
a mechanistic understanding of linguistic
diversity. BioScience 63(7). 524–535.
Gavin, Michael, Thiago Rangel, Claire Bowern, Robert Colwell, Kathryn Kirby, Carlos Botero, Michael Dunn, inter alia & Russell Gray. 2017. Process-based
modelling shows how climate and demography shape language diversity. Global Ecology and
Biogeography 26(5). 584–591.
Gray, Russell & Quentin Atkinson. 2003. Language-tree
divergence times support the Anatolian theory of Indo-European
origin. Nature 426(6965). 435–439.
Greenhill, Simon & Russell Gray. 2012. Basic
vocabulary and Bayesian phylolinguistics: Issues of understanding and
representation. Diachronica 29(4). 523–537.
Greenhill, Simon, Hannah Haynie, Robert Ross, Angela Chira, Johann-Mattis List, Lyle Campbell, Carlos Botero & Russell Gray. 2023. A
recent northern origin for the Uto-Aztecan
family. Language preprint. 81–107.
Harding, Rosalind M. & Robert R. Sokal. 1988. Classification
of the European language families by genetic distance. Proceedings of the National Academy of
Sciences 85(23). 9370–9372.
Harrington, Jonathan, Michele Gubian, Mary Stevens & Florian Schiel. 2019. Phonetic
change in an Antarctic winter. The Journal of the Acoustical Society of
America 146(5). 3327–3332.
Hartmann, Frederik & Gerhard Jäger. 2023. Gaussian
process models for geographic controls in phylogenetic trees. Open Research
Europe 3(57).
Heggarty, Paul, Warren Maguire & April McMahon. 2010. Splits
or waves? Trees or webs? How divergence measures and network analysis can unravel language
histories. Philosophical Transactions of the Royal Society B: Biological
Sciences 365(1559). 3829–3843.
Holden, Clare & Russell Gray. 2006. Rapid
radiation, borrowing and dialect continua in the Bantu
languages. In Peter Forster & Colin Renfrew (eds.), Phylogenetic
methods and the prehistory of
languages, 19–31. Cambridge: McDonald Institute for Archaeological Research.
Jacques, Guillaume & Johann-Mattis List. 2019. Save
the trees: Why we need tree models in linguistic reconstruction (and when we should apply
them). Journal of Historical
Linguistics 9(1). 128–167.
Jäger, Gerhard. 2018. Global-scale
phylogenetic linguistic inference from lexical resources. Scientific
Data 5(1). 1–16.
Kalyan, Siva & Alexandre François. 2018. Freeing
the comparative method from the tree model: A framework for historical
glottometry. In Ritsuko Kikusawa & Lawrence A. Reid (eds.), Let’s
talk about trees: Genetic relationships of languages and their phylogenetic representation (Senri
Ethnological Studies
98), 59–89. Osaka: National Museum of Ethnology.
. 2019. When
the waves meet the trees: A response to Jacques and List. Journal of Historical
Linguistics 9(1). 168–177.
Koile, Ezequiel, Simon Greenhill, Damián Blasi, Remco Bouckaert & Russell Gray. 2022. Phylogeographic
analysis of the Bantu language expansion supports a rainforest route. Proceedings of the
National Academy of Sciences 119(32).
McElreath, Richard. 2020. Statistical
rethinking: A Bayesian course with examples in R and Stan. Boca Raton, FL: CRC press.
McMahon, April & Robert McMahon. 2005. Language
classification by numbers. Oxford: Oxford University Press.
Nakhleh, Luay, Don Ringe & Tandy Warnow. 2005. Perfect
phylogenetic networks: A new methodology for reconstructing the evolutionary history of natural
languages. Language 81(2). 382–420.
Nichols, Johanna & Tandy Warnow. 2008. Tutorial
on computational linguistic phylogeny. Language and Linguistics
Compass 2(5). 760–820.
Pacheco Coelho, Marco Túlio, Elisa Barreto Pereira, Hannah Haynie, Thiago Rangel, Patrick Kavanagh, Kathryn Kirby, Simon Greenhill, inter alia & Michael Gavin. 2019. Drivers
of geographical patterns of North American language diversity. Proceedings of the Royal Society
B 286(1899).
Ranacher, Peter, Nico Neureiter, Rik van Gijn, Barbara Sonnenhauser, Anastasia Escher, Robert Weibel, Pieter Muysken & Balthasar Bickel. 2021. Contact-tracing
in cultural evolution: A Bayesian mixture model to detect geographic areas of language
contact. Journal of the Royal Society
Interface 18(181).
Ringe, Don, Tandy Warnow & Ann Taylor. 2002. Indo-European
and computational cladistics. Transactions of the Philological
Society 100(1). 59–129.
Sevenants, Anthe & Dirk Speelman. 2021. Keeping
up with the neighbours – An agent-based simulation of the divergence of the standard Dutch pronunciations in the Netherlands
and Belgium. Computational Linguistics in the Netherlands
Journal 111. 5–26.
Takahashi, Takuya & Yasuo Ihara. 2023. Spatial
evolution of human cultures inferred through Bayesian phylogenetic analysis. Journal of the
Royal Society
Interface 20(198). 20220543.
Trudgill, Peter. 1974. Linguistic
change and diffusion: Description and explanation in sociolinguistic dialect
geography. Language in
Society 3(2). 215–246.
Weidmann, Nils B. & Idean Salehyan. 2013. Violence
and ethnic segregation: A computational model applied to Baghdad. International Studies
Quarterly 57(1). 52–64.
Yanovich, Igor. 2020. Phylogenetic
linguistic evidence and the Dene-Yeniseian
homeland. Diachronica 37(3). 410–446.
Cited by (2)
Cited by two other publications
Bowern, Claire, Alan C. L. Yu, Salikoko S. Mufwene, Marlyse Baptista, Justin M. Power, Richard P. Meier, Bridget Drinka, Uta Reinöhl & Simon Greenhill
This list is based on CrossRef data as of 8 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
