Article published In: Diachronica
Vol. 42:1 (2025) ► pp.1–46
Paradigmatic complexity metrics as signals of phylogenetic relatedness
A proof of concept in Romance and Pamean diachrony
Available under the Creative Commons Attribution (CC BY) 4.0 license.
For any use beyond this license, please contact the publisher at rights@benjamins.nl.
Open Access publication of this article was funded through a Transformative Agreement with University of Zurich.
Published online: 4 March 2025
https://doi.org/10.1075/dia.23004.her
https://doi.org/10.1075/dia.23004.her
Abstract
Morphological complexity metrics like entropy, and notions like the Paradigm Cell-Filling Problem, have recently (re)gained popularity for the synchronic analysis of inflectional systems. The potential of these quantitative approaches for diachronic research, however, remains largely untapped. This paper constitutes a first exploration of whether and, if so, how these methods can be used profitably in this domain. We first use Romance, for which we have rich historical knowledge, to establish the diagnostic value of complexity metrics for phylogenetic relatedness under a best-case scenario. We then apply these methods to Pamean (Otomanguean, Mexico) to show that the same metrics help diagnose the phylogenetic relatedness of tenses and inflection classes even when, as is the case in this family, most of the morphological material has been replaced or altered beyond recognition. Results suggest that complexity metrics can successfully diagnose phylogenetic relatedness over extended periods of time and fruitfully complement traditional qualitative approaches.
Keywords: paradigm structure, complexity metrics, diachrony, relatedness, cognacy
Résumé
Les mesures de complexité morphologique comme l’entropie et des notions comme le problème de remplissage du paradigme (PCFP) ont récemment (re)gagné en popularité pour l’analyse synchronique des systèmes flexionnels. Le potentiel de ces approches quantitatives pour la recherche diachronique reste cependant largement inexploité. Cet article constitue une première exploration de la question de savoir comment ces méthodes peuvent être utilisées avec succès dans ce domaine. Nous utilisons d’abord les langues romanes pour établir la valeur diagnostique des métriques de complexité pour la relation phylogénétique dans le meilleur des cas, avec de riches connaissances historiques, puis nous les appliquons aux langues paméenes pour montrer que les mêmes métriques aident à diagnostiquer la relation phylogénétique des temps et des classes d’inflexion, même lorsque, comme c’est le cas dans cette famille, la majeure partie du matériel morphologique a été remplacée ou altérée au point de devenir méconnaissable. Les résultats suggèrent que les mesures de complexité peuvent diagnostiquer avec succès la parenté phylogénétique sur des périodes prolongées et compléter de manière fructueuse les approches qualitatives traditionnelles.
Zusammenfassung
Morphologische Komplexitätsmetriken wie die Entropie und Konzepte wie das Paradigm Cell-Filling Problem haben in letzter Zeit (wieder) an Popularität für die synchrone Analyse von Flexionssystemen gewonnen. Das Potenzial dieser quantitativen Ansätze für die diachrone Forschung bleibt jedoch weitgehend ungenutzt. Dieser Beitrag stellt eine erste Untersuchung dar, ob diese Methoden in diesem Bereich gewinnbringend eingesetzt werden können. Wir verwenden zunächst die romanischen Sprachen, um den diagnostischen Wert von Komplexitätsmetriken für die phylogenetische Verwandtschaft in einem Best-Case-Szenario mit umfassendem historischem Wissen zu ermitteln, und wenden sie dann auf die pamäischen Sprachen an, um zu zeigen, dass dieselben Metriken bei der Diagnose der phylogenetischen Verwandtschaft von Zeitformen und Flexionsklassen helfen, selbst wenn, wie es in dieser Familie der Fall ist, der größte Teil der Morphologie ersetzt oder bis zur Unkenntlichkeit verändert wurde. Die Ergebnisse legen nahe, dass Komplexitätsmetriken die phylogenetische Verwandtschaft über längere Zeiträume hinweg erfolgreich diagnostizieren und traditionelle qualitative Ansätze fruchtbar ergänzen können.
Article outline
- 1.Introduction
- 2.Paradigm complexity metrics
- 2.1Distillations
- 2.2n-MPS entropy
- 2.3Static vs. dynamic principal parts
- 2.4Density (of static and dynamic principal parts)
- 2.5Cell predictability
- 2.6Inflection class predictability
- 2.7Cell predictiveness
- 2.8Cell predictor number
- 2.9Other metrics
- 3.The quantitative profile of Romance conjugations
- 4.Diagnostic use of IC complexity metrics in Oto-Pamean languages
- 4.1Quantitative matching of Pame homologous tenses and ICs
- 4.2Qualitative matching of Pame homologous tenses and ICs
- 5.Conclusion
- Supplementary materials
- Acknowledgements
- Notes
References
References (79)
Ackerman, Farrell, James P. Blevins & Robert Malouf. 2009. Parts and wholes: Patterns of relatedness in complex morphological systems and why they matter. In James P. Blevins & Juliette Blevins (eds.), Analogy in grammar: Form and acquisition: 54–82. Oxford: Oxford University Press.
Ackerman, Farrell, & Robert Malouf. 2013. Morphological organization: The low conditional entropy conjecture. Language 89(3). 429–464.
Alpher, Barry, Nicholas Evans, and Mark Harvey. 2003. Proto Gunwinyguan verb suffixes. In Nicholas Evans (ed.), The Non-Pama-Nyungan languages of northern Australia: Comparative studies of the continent’s most linguistically complex region. Canberra: Pacific Linguistics.
De Angulo, Jaime. 1933. The Chichimeco language (Central Mexico). International Journal of American Linguistics 7(3/4). 152–194.
Bartholomew, Doris. 1965. The reconstruction of Otopamean (México). PhD dissertation, University of Chicago.
Beniamine, Sacha, Martin Maiden & Erich Round. 2020. Opening the Romance Verbal Inflection Dataset 2.0: A CLDF Lexicon. In Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi … Stelios Piperidis (eds.), Proceedings of The 12th Language Resources and Evaluation Conference, 3027–3035.
Bickel, Balthasar, and Johanna Nichols. 2007. Inflectional morphology. In Matthew S. Dryer and Timothy Shopen (eds.), Language typology and syntactic description: 169–240. Cambridge: Cambridge University Press.
Blasi, Damián E., Søren Wichmann, Harald Hammarström, Peter F. Stadler, and Morten H. Christiansen. 2016. Sound–meaning association biases evidenced across thousands of languages. Proceedings of the National Academy of Sciences 113(39). 10818–10823.
Bonami, Olivier, Gauthier Caron & Clément Plancq. 2014. Construction d’un lexique flexionnel phonétisé libre du français [Construction of a free phoneticized inflectional lexicon of French.] In Franck Neveu, Peter Blu-menthal, Linda Hriba, Annette Gerstenberg, Judith Meinschaefer and Sophie Prévost (eds.), Actes du quatrième congrès mondial de linguistique française: 2583–2596.
Bonami, Olivier & Sacha Beniamine. 2016. Joint predictiveness in inflectional paradigms. Word Structure, 9(2). 156–182.
Boyé, Gilles, and Patricia Cabredo-Hofherr. 2006. The structure of allomorphy in Spanish verbal inflection. Cuadernos de Lingüística del Instituto Universitario Ortega y Gasset 131. 9–24.
Bürkner, Paul-Christian. 2017. brms: An R package for Bayesian multilevel models using Stan. Journal of statistical software 80(1). 1–28.
Campbell, Eric W. 2017. Otomanguean historical linguistics: Exploring the subgroups. Language and Linguistics Compass 11. 7.
Carpenter, Bob, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt … Allen Riddell. 2017. Stan: A probabilistic programming language. Journal of Statistical Software 76(1). 1–32.
Carstairs-McCarthy, Andrew. 1994. Inflection classes, gender, and the principle of contrast. Language 701: 737–788.
Cotterell, Ryan, Christo Kirov, Mans Hulden, and Jason Eisner. 2019. On the complexity and typology of inflectional morphological systems. Transactions of the Association for Computational Linguistics 71. 327–342.
Erben Johansson, Niklas, Andrey Anikin, Gerd Carling, and Arthur Holmer. 2020. The typology of sound symbolism: Defining macro-concepts via their semantic and phonetic features. Linguistic Typology 24(2). 253–310.
Feist, Timothy & Enrique L. Palancar. 2015. Oto-Manguean Inflectional Class Database. University of Surrey.
Finkel, Raphael & Gregory Stump. 2007. Principal parts and morphological typology. Morphology 17(1). 39–75.
. 2009. What your teacher told you is true: Latin verbs have four principal parts. Digital Humanities Quarterly 3. 1.
Ford, Lysbeth Julie. 1990. The phonology and morphology of Bachamal (Wogait). PhD dissertation, Australian National University.
Greenhill, Simon J., Quentin D. Atkinson, Andrew Meade, and Russell D. Gray. 2010. The shape and tempo of language evolution. Proceedings of the Royal Society B: Biological Sciences 277(1693). 2443–2450.
Harvey, Mark. 2001. A grammar of Limilngan: A language of the Mary River region, Northern Territory, Australia. Canberra: Pacific Linguistics.
Heath, Jeffrey. 1980a. Basic materials in Warndarang: grammar, texts and dictionary. Canberra: Pacific Linguistics.
. 1980b. Basic materials in Ritharngu: Grammar, texts and dictionary. Canberra: Pacific Linguistics.
Hein, Johannes & Gereon Müller. 2009. Quantitative and qualitative aspects of paradigm economy in lesser studied languages. Paper presented at Morphology of the World’s Languages, Universität Leipzig.
Herce, Borja. 2020. Alignment of forms in Spanish verbal inflection: The gang poner, tener, venir, salir, valer as a window into the nature of paradigmatic analogy and predictability. Morphology 30(2). 91–115.
. 2021. Stem alternations in Kiranti and their implications for the morphology–phonology interface. Journal of Linguistics 57(2). 321–363.
. 2024. VeLeSpa: An inflected verbal lexicon of Peninsular Spanish and a quantitative analysis of paradigmatic predictability. Language Resources and Evaluation (2024): 1–14.
Herce, Borja & Bogdan Pricop. 2024. VeLeRo: An inflected verbal lexicon of Standard Romanian and a quantitative analysis of morphological predictability. Language Resources and Evaluation.
Jamieson, Carole Ann. 1982. Conflated subsystems marking person and aspect in Chiquihuitlán Mazatec verbs. International Journal of American Linguistics 48(2). 139–167.
Johnson, Tamar, Kexin Gao, Kenny Smith, Hugh Rabagliati, and Jennifer Culbertson. 2021. Investigating the effects of i-complexity and e-complexity on the learnability of morphological systems. Journal of Language Modeling 9(1), 97-150.
Josserand, J. Kathryn, Marcus Winter & Nicholas A. Hopkins. 1984. Essays in Otomanguean culture history. Nashiville, TN: Vanderbilt University.
Kaufman, Terrence. 2006. Oto-Manguean languages. In Keith Brown (ed.), Encyclopedia of Language & Linguistics, vol. 91, 118–124. Oxford: Elsevier.
Kaufman, Terrence & Justeson, John. 2009. Historical linguistics and pre-Columbian Mesoamerica. Ancient Mesoamerica. 20 (2). 221–231.
Kirov, Christo, Ryan Cotterell, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Patrick Xia, Manaal Faruqui et al. 2018. UniMorph 2.0: universal morphology. arXiv preprint arXiv:1810.11101.
Kolmogorov, Andrei N. 1965. Three approaches to the quantitative definition of information. Problems of Information Transmission 1(1). 1–7.
Maiden, Martin. 2018. The Romance Verb: Morphomic Structure and Diachrony. Oxford: Oxford University Press.
Mańczak, Witold. 1957. Tendances générales des changements analogiques [General trends in analog changes.] Lingua 71. 298–325.
Manrique Castañeda, Leonardo. 2000. Lingüística histórica [Historical linguistics], In Manzanilla, Linda y Leonardo López Lujan (eds.), Historia antigua de México, Volumen I: El México antiguo, sus áreas culturales, los orígenes y le horizonte Preclásico, México, INAH-UNAM-PORRUA: 53–93.
Matthews, Peter H. 1965. The inflectional component of a word-and-paradigm grammar. Journal of Linguistics 1(2). 139–171.
Matras, Yaron. 2015. Why is the borrowing of inflectional morphology dispreferred. In Francesco Gardani, Peter Arkadiev, and Nino Amiridze (eds.), Borrowed morphology: 47–80. Boston: De Gruyter.
Meillet, Antoine. 1925. La méthode comparative en linguistique historique [The Comparative Method and historical linguistics]. Paris: Champion.
. 1958. Linguistique historique et linguistique générale [Historical linguistics and general linguistics]. Société Linguistique de Paris, Collection Linguistique, 81. Paris: Librairie Honoré Champion.
Merlan, Francesca MS. n.d. Jawoyn grammar, texts and dictionary. Manuscript held in Australian Institute of Aboriginal and Torres Strait Islander Studies.
Milin, Petar, Victor Kuperman, Aleksandar Kostic, and R. Harald Baayen. 2009. Paradigms bit by bit: An information theoretic approach to the processing of paradigmatic structure in inflection and derivation. In Juliette Blevins and James Blevins (eds.), Analogy in grammar: Form and acquisition: 214–252. Oxford: Oxford University Press.
Montermini, Fabio & Olivier Bonami. 2013. Stem spaces and predictability in verbal inflection. Lingue e linguaggio 12(2). 171–190.
Nichols, Johanna. 1996. The Comparative Method as heuristic. In Durie, Mark, and Malcolm Ross (eds.), The Comparative Method reviewed: Regularity and irregularity in language change, 39–71. New York: Oxford University Press.
. 2014. Derivational paradigms in diachrony and comparison. In Martine Robbeets and Walter Bisang (eds.) Paradigm change in the Transeurasian languages and beyond, 61–88. Amsterdam: John Benjamins.
Palancar, Enrique L., and Heriberto Avelino. 2019. Inflectional complexity and verb classes in Chichimec. Amerindia 411. 323–360.
Pellegrini, Matteo & Marco Passarotti. 2018. LatInfLexi: an Inflected Lexicon of Latin Verbs. CLiC-it, 324–329.
Pellegrini, Matteo, and Alessandra Teresa Cignarella. 2020. (Stem and Word) Predictability in Italian verb paradigms: An entropy-based study exploiting the new resource LeFFI. Proceedings of CLiC-it 2020: 1–6.
Perea, Maria-Pilar, & Hiroto Ueda. 2010. Applying Quantitative Analysis Techniques to La flexió verbal en els dialectes catalans. Dialectologia et Geolinguistica 18(1). 99–114.
R Core Team. 2020. R: A language and environment for statistical computing. R Foundation for Statistical Computing. Retrieved from [URL]
Reid, Nicholas John. 1990. Ngan’gityemerri: A language of the Daly River region. PhD dissertation, Australian National University.
Saulwick, Adam. 2003. Aspects of the verb in Rembarrnga: A polysynthetic language of northern Australia: Grammatical description, texts and dictionary. PhD dissertation, University of Melbourne.
Schultze-Berndt, Eva Friederike. 2000. Simple and complex verbs in Jaminjung: A study of event categorisation in an Australian language. PhD dissertation, Radboud University Nijmegen.
Scrucca, Luca, Michael Fop, T. Brendan Murphy and Adrian E. Raftery. 2016. mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. The R Journal, 8(1). 289–317.
Shannon, Claude E. 1948. A mathematical theory of communication. The Bell System Technical Journal 27(3), 379–423.
Sims, Andrea D. & Jeff Parker. 2016. How inflection class systems work: On the informativity of implicative structure. Word Structure 9(2). 215–239.
Singer, Ruth. 2006. Agreement in Mawng: Productive and lexicalised uses of agreement in an Australian language. PhD dissertation, University of Melbourne.
Snyder, Benjamin & Regina Barzilay. 2008. Unsupervised multilingual learning for morphological segmentation. In Proceedings of ACL-08: hlt, 737–745.
Stump, Gregory & Raphael A. Finkel. 2013. Morphological typology: From word to paradigm. Vol. 1381. Cambridge: Cambridge University Press.
Tadmor, Uri. 2009. Loanwords in the world’s languages: Findings and results. In Martin Haspelmath & Uri Tadmor (eds.), Loanwords in the world’s languages: A comparative handbook, 55–75. Berlin: De Gruyter Mouton.
Trudgill, Peter. 2011. Sociolinguistic typology: Social determinants of linguistic complexity. Oxford: Oxford University Press.
Verkerk, Annemarie, and Francesca Di Garbo. 2022. Sociogeographic correlates of typological variation in northwestern Bantu gender systems. Language Dynamics and Change 11. 1–69.
Widmer, Manuel, Mathias Jenny, Wolfgang Behr, and Balthasar Bickel. 2021. Morphological structure can escape reduction effects from mass admixture of second language speakers: evidence from Sino-Tibetan. Studies in Language 45(4). 707–752.
