Article published In: Diachronica
Vol. 41:1 (2024) ► pp.46–98
Disentangling Ancestral State Reconstruction in historical linguistics
Comparing classic approaches and new methods using Oceanic grammar
Available under the Creative Commons Attribution-NonCommercial (CC BY-NC) 4.0 license.
For any use beyond this license, please contact the publisher at rights@benjamins.nl.
This article was made Open Access under a CC BY-NC 4.0 license through payment of an APC by or on behalf of the author.
Published online: 7 March 2024
https://doi.org/10.1075/dia.22022.ski
https://doi.org/10.1075/dia.22022.ski
Abstract
Ancestral State Reconstruction (ASR) is an essential part of historical linguistics (HL).
Conventional ASR in HL relies on three core principles: fewest changes on the tree, plausibility of changes and
plausibility of the resulting combinations of features in proto-languages. This approach has some problems, in particular the
definition of what is plausible and the disregard for branch lengths. This study compares the classic approach of ASR to
computational tools (Maximum Parsimony and Maximum Likelihood), conceptually and practically. Computational models have the
advantage of being more transparent, consistent and replicable, and the disadvantage of lacking nuanced knowledge and context.
Using data from the structural database Grambank, I compare reconstructions of the grammar of ancestral Oceanic languages from
the HL literature to those achieved by computational means. The results show that there is a high degree of
agreement between manual and computational approaches, with a tendency for classical HL to ignore branch lengths.
Explicitly taking branch lengths into account is more conceptually sound; as such the field of HL should
engage in improving methods in this direction. A combination of computational methods and qualitative knowledge is possible in the
future and would be of great benefit.
Résumé
La reconstruction de l’état ancestral (ASR) est une partie essentielle de la linguistique
historique (HL). L’ASR conventionnel en HL repose sur trois principes fondamentaux : le moins de
changements sur l’arbre, la plausibilité des changements et la plausibilité des combinaisons de caractéristiques résultantes dans
les protolangues. Cette approche présente quelques problèmes, en particulier la définition de ce qui est plausible et l’ignorance
des longueurs de branche. Cette étude compare l’approche classique de l’ASR aux outils informatiques (Maximum Parsimony
et Maximum Likelihood), sur les plans conceptuel et pratique. Les modèles informatiques ont l’avantage d’être plus transparents,
cohérents et reproductibles, et le désavantage de manquer des connaissances et des contextes nuancés. À l’aide de la base de
données structurelle Grambank, je compare les reconstructions de la grammaire des langues océaniennes ancestrales de la
littérature linguistique historique à celles réalisées par des moyens informatiques. Les résultats montrent qu’il existe un degré
élevé d’accord entre les approches manuelles et informatiques, avec une tendance pour la HL classique à s’accorder
davantage avec les approches qui ignorent les longueurs de branche. La prise en compte explicite des longueurs de branche est plus
appropriée du point de vue conceptuel. En tant que tel, la linguistique historique devrait s’engager dans l’amélioration des
méthodes dans cette direction. Une combinaison de méthodes informatiques et de connaissances qualitatives est possible à l’avenir
et serait très bénéfique.
Zusammenfassung
Ancestral State Reconstruction (ASR) ist ein wesentlicher Bestandteil der historischen Linguistik
(HL). Konventionelle ASR in der HL basiert auf drei Grundprinzipien: möglichst wenige Änderungen
des Baumes, Plausibilität von Änderungen und Plausibilität der resultierenden Protosprachen. Dieser Ansatz weist einige Probleme
auf, insbesondere die Definition von plausibel und die Nichtberücksichtigung der Länge von Zweigen. Die vorliegende Studie
vergleicht den klassischen Ansatz von ASR konzeptionell und praktisch mit computergestützten Werkzeugen (Maximum
Parsimony und Maximum Likelihood). Computergestützte Modelle haben den Vorteil, dass sie transparenter, konsistenter und
reproduzierbarer sind, und den Nachteil, dass differenziertes Wissen und Kontext nur begrenzt berücksichtigt werden. Anhand von
Daten aus der Grambank-Datenbank, die grammatische und strukturelle Merkmale beinhaltet, vergleiche ich Rekonstruktionen der
Grammatik der ozeanischen Ursprungssprachen aus der historischen linguistischen Literatur mit solchen, die mit computergestützten
Werkzeugen erzielt wurden. Die Ergebnisse zeigen, dass es ein hohes Maß an Übereinstimmung zwischen Ergebnissen aus manuellen und
computergestützten Ansätzen gibt, wobei die klassische HL tendenziell eher mit Ansätzen übereinstimmt, die die Länge von
Zweigen ignorieren. Die explizite Berücksichtigung von Zweiglängen ist konzeptionell fundierter, daher sollte sich die
HL mit der Verbesserung der Methoden in dieser Richtung befassen. Eine Kombination aus computergestützten Methoden
und qualitativem Wissen ist künftig möglich und wäre von großem Nutzen.
Article outline
- 1.Introduction
- 2.Background
- 2.1The methods of ancestral state reconstruction in traditional historical linguistics
- 2.1.1Disagreements in HL
- 2.2Evaluating if the data are valid for phylogenetic analysis: The Double Cognacy Condition and phylogenetic signal
- 2.3Computational phylogenetic methods
- 2.1The methods of ancestral state reconstruction in traditional historical linguistics
- 3.Materials and methods
- 3.1Methods: Maximum parsimony, maximum likelihood and most common
- 3.2Calculation of similarity between predictions from conventional HL and computational approaches
- 3.3Data
- 3.3.1The Grambank dataset
- 3.3.2Data coverage
- 3.3.3The trees
- 3.3.4Data from historical linguistics on Oceanic proto-language grammar
- 4.Results
- 4.1Concordance between traditional HL and computational methods
- 4.2New predictions
- 4.3Where the conflicts are: Ergativity
- 5.Conclusions
- Acknowledgements
- Notes
- Abbreviations
- Supplementary material
References
References (98)
Anttila, Raimo. 1989. Historical and comparative linguistics. Amsterdam: John Benjamins.
Atkinson, Quentin D. and Russell D. Gray. 2005. Curious parallels and curious connections—Phylogenetic thinking in biology and historical linguistics. Systematic Biology 54(4). 513–526.
Ball, Douglas. 2007. On ergativity and accusativity in Proto-Polynesian and Proto-Central Pacific. Oceanic Linguistics 128–153.
Beaulieu, Jeremy, Brian O’Meara, Jeffrey Oliver and James Boyko. 2022. corhmm: Hidden Markov models of character evolution. [URL]. R package version 2.8.
Bellwood, Peter. 2011. Holocene population history in the pacific region as a model for worldwide food producer dispersals. Current Anthropology 52(S4). S363–S378.
Blomberg, S. P. and T. Garland. 2002. Tempo and mode in evolution: Phylogenetic inertia, adaptation and comparative methods. Journal of Evolutionary Biology 15(6). 899–910.
Blomberg, Simon P., Theodore Garland and Anthony R. Ives. 2003. Testing for phylogenetic signal in comparative data: Behavioral traits are more labile. Evolution 57(4). 717.
Blust, Robert A. 1996. The neogrammarian hypothesis and pandemic irregularity. In Mark Durie and Malcolm Ross (eds.), The comparative method reviewed: Regularity and irregularity in language change, 135–156. Oxford: Oxford University Press.
2014. Some recent proposals concerning the classification of the Austronesian languages. Oceanic Linguistics 53(2). 300–391.
Blust, Robert A. and Victoria Chen. 2017. The pitfalls of negative evidence: ‘Nuclear Austronesian’, ‘ergative Austronesian’, and their progeny. Language and Linguistics 18(4). 577–621.
Borges, Rui, Joã Paulo Machado, Cidália Gomes, Ana Paula Rocha and Agostinho Antunes. 2018. Measuring phylogenetic signal between categorical traits and phylogenies. Bioinformatics 35(11). 1862–1869.
Campbell, Lyle. 1996. On sound change and challenges to regularity. In Mark Durie and Malcom Ross (eds.), The comparative method reviewed: Regularity and irregularity in language change, 72–89. Oxford: Oxford University Press.
Carling, Gerd and Chundra Cathcart. 2021. Reconstructing the evolution of Indo-European grammar. Language 971.
Cathcart, Chundra Aroor. 2018. Modeling linguistic evolution: A look under the hood. Linguistics Vanguard 4(1).
Chang, Will, Chundra Cathcart, David Hall and Andrew Garrett. 2015. Ancestry-constrained phylogenetic analysis supports the Indo-European steppe hypothesis. Language 91(1). 194–244.
Chung, Sandra. 1977. Review of Clark, R. Aspects of Proto-Polynesian syntax. The Journal of the Polynesian Society 86(4). 537–540.
. 1978. Case marking and grammatical relations in Polynesian languages. Austin: University of Texas.
Clark, D. Ross. 1973. Aspects of Proto-Polynesian syntax. San Diego: University of California dissertation.
Crowley, Terry. 1985. Common noun phrase marking in Proto-Oceanic. Oceanic Linguistics 24(1/2). 135–193.
Darwin, Charles. 1859. On the origin of species by means of natural selection, or the preservation of favoured races in the struggle for life. London: Murray.
Drummond, Alexei J. and Remco R. Bouckaert. 2015. Bayesian evolutionary analysis with BEAST. Cambridge: Cambridge University Press.
Evans, Bethwyn. 2001. A study of valency-changing devices in Proto Oceanic. Canberra: Australian National University dissertation.
Evans, Cara, Simon J. Greenhill, Joseph Watts, Johann-Mattis List, Carlos A. Botero, Russell Gray and Kathryn R. Kirby. 2021. The uses and abuses of tree thinking in cultural evolution. Philosophical Transactions of the Royal Society B 3761(1828). 20200056.
Fritz, Susanne A. and Andy Purvis. 2010. Selectivity in mammalian extinction risk and threat types: A new measure of phylogenetic signal strength in binary traits. Conservation Biology 24(4). 1042–1051.
Geraghty, Paul A. 1996. Problems with Central Pacific. In John Lynch and Fa’afo Pat (eds.), Oceanic studies: Proceedings of the first international conference on Oceanic linguistics, 83–91. Canberra: Pacific Linguistics.
Goddard, Ives. 1993. Contamination in morphological change in Algonquian languages. In Aertsen and Robert J. Jeffers (eds.), Historical linguistics 1989: Papers from the 9th international conference on historical linguistics, New Brunswick, 14–18 August 1989 [Current Issues in Linguistic Theory 106], 129–140. New Brunswick, NJ: John Benjamins: John Benjamins.
Goldstein, David. 2022. There’s no escaping phylogenetics. In Laura Grestenberger, Hannes A. Reiss, Charlesand Fellner and Gabriel Z. Pantillon (eds.), Ha! Linguistic studies in honor of Mark R. Hale, 71–91. Wiesbaden: Reichert. [URL]
Grace, George William. 1958. The position of the Polynesian languages within the Austronesian (Malayo-Polynesian) language family. New York: Columbia University dissertation.
Gray, Russell D., Alexei J. Drummond and Simon J. Greenhill. 2009. Language phylogenies reveal expansion pulses and pauses in Pacific settlement. Science 323(5913). 479–483.
Greenhill, Simon. 2015. Evolution and language: Phylogenetic analyses. In James D. Wright (ed.), International encyclopedia of the social & behavioral sciences 2nd edn., 370–377. Oxford: Elsevier.
Greenhill, Simon and Russell Gray. 2009. Austronesian language phylogenies: Myths and misconceptions about Bayesian computational methods. In Alexander Adelaar and Andrew Pawley (eds.), Austronesian historical linguistics and culture history: A festschrift for Robert Blust, 375–397. Canberra: Pacific Linguistics . [URL]
Greenhill, Simon J., Robert Andrew Blust and Russell D. Gray. 2008. The Austronesian basic vocabulary database: From bioinformatics to lexomics. Evolutionary Bioinformatics 41. 271–283.
Greenhill, Simon J. and Ross Clark. 2011. Pollex-online: The Polynesian lexicon project online. Oceanic Linguistics 50(2). 551–559.
Hammarström, Harald, Thom Castermans, Robert Forkel, Kevin Verbeek, Michel A. Westenberg and Bettina Speckmann. 2018. Simultaneous visualization of language endangerment and language description. Language Documentation & Conservation 121. 359–392. [URL]
Hammarström, Harald, Robert Forkel, Martin Haspelmath and Sebastian Bank. 2021. Glottolog/glottolog: Glottolog database 4.5.
Harris, Alice C. 2008. Reconstruction in syntax: Reconstruction of patterns. Principles of Syntactic Reconstruction 731. 95.
Heine, Bernd. 2003. Grammaticalization. In Brian D. Joseph and Richard D. Janda (eds.), The handbook of historical linguistics, 624–647. Blackwell.
Hohepa, Patrick W. 1967. A profile generative grammar of Maori. (Indiana University Publications in Anthropology and Linguistics 20). Baltimore, MD: Waverly Press
1969. The accusative-to-ergative drift in Polynesian languages. Journal of the Polynesian Society 78(3). 295–329.
Holland, Barbara R., Saan Ketelaar-Jones, Aidan R. O’Mara, Michael D. Woodhams and Gregory J. Jordan. 2020. Accuracy of ancestral state reconstruction for non-neutral traits. Scientific Reports 10(1).
Huelsenbeck, John P., Rasmus Nielsen and Jonathan P. Bollback. 2003. Stochastic mapping of morphological characters. Systematic Biology 52(2). 131–158.
Hübler, Nataliia. 2022. Phylogenetic signal and rate of evolutionary change in language structures. Royal Society Open Science 9(3).
Ives, Anthony R. and Theodore Garland. 2009. Phylogenetic logistic regression for binary dependent variables. Systematic Biology 59(1). 9–26.
Jäger, Gerhard and Johann-Mattis List. 2018. Using ancestral state reconstruction methods for onomasiological reconstruction in multilingual word lists. Language Dynamics and Change 8(1). 22–54.
Jäger, Gerhard and Søren Wichmann. 2016. Inferring the world tree of languages from word lists. In Seán G. Roberts, Christine Cuskley, Luke McCrohon, Lluis Barceló-Coblijn, Olga Feher and Tessa Verhoef (eds.), The evolution of language: Proceedings of the 11th international conference.
Jombart, Thibaut, Stéphane Dray and Anders Ellern Bilgrau. 2017. Package ‘adephylo’. [URL]
Jonsson, Niklas. 1997. Det polynesiska verbmorfemet -Cia; om dess funktion i samoanska [The Polynesian verbal morpheme -Cia; its function in Samoan] Uppsala, Sweden: Uppsala University BA thesis.
Joy, Jeffrey B., Richard H. Liang, Rosemary M. McCloskey, T. Nguyen and Art F. Y. Poon. 2016. Ancestral reconstruction. PLoS Computational Biology 12(7). e1004763.
Kikusawa, Ritsuko. 2002. Proto Central Pacific ergativity: Its reconstruction and development in the Fijian, Rotuman and Polynesian Languages. Canberra: Pacific Linguistics.
. 2006. On the development of number systems in Oceanic pronouns. Talk given at Proceedings of the 6th international conference on Oceanic linguistics (cool6), Port Vila, Vanuatu.
de Lamarck, Jean Baptiste Pierre Antoine de Monet. 1809. Philosophie zoologique, ou exposition des considérations relatives à l’histoire naturelle des animaux, tome 21. Paris: F Savy.
Liggett, Thomas. 2010. Continuous time Markov chains. In Graduate studies in mathematics, 57–90. Providence, RI: American Mathematical Society.
List, Johann-Mattis, Robert Forkel, Simon J. Greenhill, Christoph Rzymski, Johannes Englisch and Russell D. Gray. 2022. Lexibank, a public repository of standardized wordlists with computed phonological and lexical features. Scientific Data 9(1).
List, Johann-Mattis, Simon J. Greenhill and Russell D. Gray. 2017. The potential of automatic word comparison for historical linguistics. PLOS ONE 12(1). e0170046.
Louca, Stilianos and Michael Doebeli. 2017. Efficient comparative phylogenetics on large trees. Bioinformatics 34(6). 1053–1055.
Lynch, John, Malcolm Ross and Terry Crowley. 2011. Proto Oceanic. In John Lynch, Malcolm Ross and Terry Crowley (eds.), The Oceanic languages [Curzon Language Family Series] 54–91. Richmond: Curzon.
Macklin-Cordes, Jayden L., Claire Bowern and Erich R. Round. 2021. Phylogenetic signal in phonotactics. Diachronica 38(2). 210–258.
Maclaurin, James and Kim Sterelny. 2008. What is biodiversity? Chicago: University of Chicago Press.
Marck, Jeffrey C. 2000. Polynesian languages. In J. Garry and C. Rubino (eds.), Facts about the world’s languages: An encyclopaedia of the world’s major languages, past and present, 560–567. New York: H.W. Wilson.
Orme, David, R. Freckleton, G. Thomas, Thomas Petzoldt, Susanne Fritz, Nick Isaac and Will Pearse. 2013. The caper package: Comparative analysis of phylogenetics and evolution in R. R package version 5(2). 1–36.
Pagel, Mark. 1999. Inferring the historical patterns of biological evolution. Nature 401(6756). 877–884.
Pagel, Mark, Andrew Meade and Daniel Barker. 2004. Bayesian estimation of ancestral character states on phylogenies. Systematic Biology 53(5). 673–684.
Paradis, Emmanuel, Julien Claude and Korbinian Strimmer. 2004. Ape: Analyses of phylogenetics and evolution in R language. Bioinformatics 20(2). 289–290.
Pawley, Andrew. 1970. Grammatical reconstruction and change in Polynesia and Fiji. In Stephen A. Wurm and Donald C. Laycock (eds.), Studies in honour of Arthur Capell, 301–368. Canberra: Pacific Linguistics. [URL]
. 1972. On the internal relationships of Eastern Oceanic languages. In Roger C. Green and Marion Kelly (eds.), Studies in Oceanic culture history, vol. 3131, 1–142. Honolulu: Pacific Anthropological Records, Bishop Museum.
. 2005. The meaning(s) of Proto Oceanic *panua. In Claudia Gross, Harriet D. Lyons and Dorothy A. Counts (eds.), A polymath anthropologist: Essays in honour of Ann Chowning (Research in Anthropology and Linguistics Monograph 6), 133–145. Auckland: Department of Anthropology, University of Auckland.
Pereltsvaig, Asya and Martin W. Lewis. 2015. Why linguists don’t do dates? – or do they? In The Indo-European controversy. Cambridge: Cambridge University Press.
R Core Team. 2019. R: A language and environment for statistical computing. [URL]
Revell, Liam J. 2023. phytools: Phylogenetic tools for comparative biology (and other things). [URL]. R package version 1.9-16.
Revelle, William. 2022. psych: Procedures for psychological, psychometric, and personality research. [URL]. R package version 2.2.9.
Ronquist, Fredrik. 2004. Bayesian inference of character evolution. Trends in Ecology & Evolution 19(9). 475–481.
Ross, Malcolm, Andrew Pawley and Meredith Osmond (eds.). 1998. Material culture (The Lexicon of Proto Oceanic: The Culture and Environment of Ancestral Oceanic Society 1). Canberra: Pacific Linguistics.
(eds.). 2007. The physical environment (The Lexicon of Proto Oceanic: The Culture and Environment of Ancestral Oceanic Society 2). Canberra: Pacific Linguistics.
(eds.). 2008. Plants (The Lexicon of Proto Oceanic: The Culture and Environment of Ancestral Oceanic Society 3). Canberra: Pacific Linguistics.
(eds.). 2011. Animals (The Lexicon of Proto Oceanic: The Culture and Environment of Ancestral Oceanic Society 4). Canberra: Pacific Linguistics.
(eds.). 2016. People: Body and mind (The Lexicon of Proto Oceanic: The Culture and Environment of Ancestral Oceanic Society 5). Canberra: Pacific Linguistics. Available at: [URL]
(eds.). 2023. People: Society (The Lexicon of Proto Oceanic: The Culture and Environment of Ancestral Oceanic Society 6). Canberra: Pacific Linguistics. [URL]. Available at: [URL]
Ross, Malcolm D. 2004. The morphosyntactic typology of Oceanic languages. Language and Linguistics 5(2). 491–541. [URL]
2007. Two kinds of locative construction in Oceanic languages: A robust distinction. In Jeff Siegel, John Lynch and Diana Eades (eds.), Language description, history and development: Linguistic indulgence in memory of Terry Crowley, 281–295. John Benjamins.
Sasaki, Yutaka et al. 2007. The truth of the f-measure. Accessed 2021-05-26. [URL]
Savage, Wesley K. and Sean P. Mullen. 2009. A single origin of Batesian mimicry among hybridizing populations of admiral butterflies (limenitis arthemis) rejects an evolutionary reversion to the ancestral phenotype. Proceedings of the Royal Society B: Biological Sciences 276(1667). 2557–2565.
Schlegel, F. 1808. Über die Sprache und Weisheit der Indier: Ein Beitrag zur Begründung der Alterthumskunde. Heidelberg: Mohr und Zimmer.
Schliep, Klaus, Emmanuel Paradis, Leonardo de Oliveira Martins, Alastair Potts and Iris Bardel-Kahr. 2023. phangorn: Phylogenetic reconstruction and analysis. [URL]. R package version 2.11.1.
Schulmeister, Susanne and Ward C. Wheeler. 2004. Comparative and phylogenetic analysis of developmental sequences. Evolution and Development 6(1). 50–57.
Siewierska, Anna. 2013. Gender distinctions in independent personal pronouns. In Matthew S. Dryer and Martin Haspelmath (eds.), The world atlas of language structures online. Leipzig: Max Planck Institute for Evolutionary Anthropology. [URL]
Skirgård, Hedvig, Hannah J. Haynie, Damián E. Blasi, Harald Hammarström, Jeremy Collins, Jay J. Latarche, Jakob Lesage, inter alia and Russell D. Gray. 2023. Grambank reveals global patterns in the structural diversity of the world’s languages. Science Advances 91.
Slingerland, Edward, Quentin D. Atkinson, Carol R. Ember, Oliver Sheehan, Michael Muthukrishna, Joseph Bulbulia and Russell D. Gray. 2020. Coding culture: Challenges and recommendations for comparative cultural databases. Evolutionary Hu-man Sciences 21.
Walkden, George. 2013. The correspondence problem in syntactic reconstruction. Dia-chronica 30(1). 95–122.
Watts, Joseph, Oliver Sheehan, Quentin D. Atkinson, Joseph Bulbulia and Russell D. Gray. 2016. Ritual human sacrifice promoted and sustained the evolution of stratified societies. Nature 532(7598). 228–231.
Wickham, Hadley. 2020. reshape2: Flexibly reshape data: A reboot of the reshape package, version 1.4. 4. [URL]
Cited by (1)
Cited by one other publication
Bowern, Claire, Alan C. L. Yu, Salikoko S. Mufwene, Marlyse Baptista, Justin M. Power, Richard P. Meier, Bridget Drinka, Uta Reinöhl & Simon Greenhill
This list is based on CrossRef data as of 8 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
