Reflex prediction: A case study of Western Kho-Bwa

Bodt, Timotheus A.; List, Johann-Mattis

doi:10.1075/dia.20009.bod

Article published In: Diachronica
Vol. 39:1 (2022) ► pp.1–38

Get fulltext from our e-platform

Download PDF

Download EPUB

Reflex prediction

A case study of Western Kho-Bwa

Timotheus A. Bodt | University of London

Johann-Mattis List | Max Planck Institute for the Science of Human History

Available under the Creative Commons Attribution-NonCommercial (CC BY-NC) 4.0 license.

For any use beyond this license, please contact the publisher at rights@benjamins.nl.

Published online: 23 April 2021

https://doi.org/10.1075/dia.20009.bod

Abstract

While analysing lexical data of Western Kho-Bwa languages of the Sino-Tibetan or Trans-Himalayan family with the help of a computer-assisted approach for historical language comparison, we observed gaps in the data where one or more varieties lacked forms for certain concepts. We employed a new workflow, combining manual and automated steps, to predict the most likely phonetic realisations of the missing forms in our data, by making systematic use of the information on sound correspondences in words that were potentially cognate with the missing forms. This procedure yielded a list of hypothetical reflexes of previously identified cognate sets, which we first preregistered as an experiment on the prediction of unattested word forms and then compared with actual word forms elicited during secondary fieldwork. In this study we first describe the workflow which we used to predict hypothetical reflexes and the process of elicitation of actual word forms during fieldwork. We then present the results of our reflex prediction experiment. Based on this experiment, we identify four general benefits of reflex prediction in historical language comparison. These comprise (1) an increased transparency of linguistic research, (2) an increased efficiency of field and source work, (3) an educational aspect which offers teachers and learners a wide plethora of linguistic phenomena, including the regularity of sound change, and (4) the possibility of kindling speakers’ interest in their own linguistic heritage.

Keywords: prediction, word prediction, comparative method, regularity of sound change, computer-assisted language comparison, Western Kho-Bwa, preregistered research, reflex prediction

Résumé

Durant l‘analyse d‘un ensemble de données lexicales du Kho-Bwa occidental (sino-tibétain/trans-himalayan) au moyen d‘une approche assistée par ordinateur de la comparaison historique des langues, nous avons observé des lacunes dans les données, où une ou plusieurs variétés ne disposaient pas d‘une forme attestée pour un certain concept. Nous avons appliqué un nouveau flux de travail dans lequel nous avons combiné les étapes manuelles traditionnelles avec des approches automatisées pour prédire la forme phonétique la plus probable des mots manquants dans notre ensemble de données (utilisant l‘information des correspondances régulières). Le résultat de ce flux de travail était une liste des mots réellement vérifiables, que nous avons ensuite pré-enregistrée. Il s’agissait d’une expérience afin de comparer la liste avec les réflexes découverts ultérieurement lors d’enquêtes de terrain. Dans cette étude, nous décrivons notre processus de travail pour la prédiction des mots hypothétiques et le processus d'élicitation lors de travaux de terrain, et présentons ensuite les résultats de notre expérience pour la prédiction de réflexes. Sur la base de l‘expérience que nous avons faite au cours de cette expérience, nous identifions quatre avantages généraux de la prédiction de mots dans la comparaison linguistique historique. Ce genre de travail peut (1) renforcer la transparence de la recherche linguistique ; 2) accroître l‘efficacité des méthodes de recherche linguistique historique, tant sur le terrain qu'à partir de sources secondaires ; 3) fournir aux enseignants et aux apprenants des exemples pratiques d‘une large gamme de phénomènes linguistiques, y compris la régularité du changement phonologique ; et 4) susciter l‘intérêt et l‘engagement des locuteurs pour leur propre patrimoine linguistique.

Zusammenfassung

Bei der Analyse lexikalischer Daten von westlichen Kho-Bwa-Sprachen aus der sinotibetischen oder transhimalayanischen Sprachfamilie mit Hilfe eines computergestützten Ansatzes zum historischen Sprachvergleich stießen wir auf Lücken in den Daten, in denen eine oder mehrere Varietäten keine attestierte Form für bestimmte Konzepte hatten. Wir verwendeten daraufhin einen neuen Workflow, in dem wir manuelle mit automatisierten Arbeitsschritten kombinierten, um die wahrscheinlichsten phonetischen Realisierungen der fehlenden Formen in unseren Daten vorherzusagen, wobei systematisch auf die Information von Lautkorrespondenzen mit möglicherweise kognaten Wörtern zurückgriffen wurde. Dieses Verfahren lieferte uns eine Liste hypothetischer Reflexe von zuvor als kognat identifizierten Wörtern, die wir als Experiment zur Vorhersage bisher nicht observierter Wörter zunächst präregistrierten, um sie dann im Rahmen einer erweiterten Feldforschung mit den tatsächlich attestierten Wortformen zu vergleichen. In dieser Studie beschreiben wir zunächst den Workflow, mit dem hypothetische Reflexe vorhergesagt werden können, sowie den Prozess der Elizitierung von aktuellen Wortformen im Rahmen der Feldforschung, und präsentieren dann die Ergebnisse unseres Experiments zur Reflexvorhersage. Basierend auf der Erfahrung, die wir mit diesem Experiment gemacht haben, identifizieren wir vier grundlegende Vorteile, welche die aktive Vorhersage unbekannter Wortformen für den historischen Sprachvergleich bietet. Diese umfassen (1) die erhöhte Transparenz der linguistischen Forschung, (2) die erhöhte Effizienz von Feldforschung und Quellenarbeit, (3) der edukative Aspekt, der Lehrenden wie Lernenden eine Vielzahl von Beispielen für linguistische Phänomene, wie zum Beispiel die Regelmäßigkeit des Lautwandels, liefert, und (4) die Möglichkeit, das Interesse von Sprecherinnen und Sprechern an ihrem linguistischen Erbe zu wecken.

Article outline

1.Introduction
2.Predicting reflexes of cognate words in Western Kho-Bwa languages
- 2.1The Western Kho-Bwa languages
- 2.2Background of the study
- 2.3Workflow for reflex prediction
- 2.4Elicitation
- 2.5Evaluation
3.Results
- 3.1General results
- 3.2Specific results
4.Benefits of reflex prediction
5.Conclusion
Supplementary material
Acknowledgements
Notes
References

References (27)

References

Amery, Rob. 2016. Warraparna Kaurna! Adelaide: University of Adelaide Press.

Blevins, Juliette. 2004. Evolutionary phonology: The emergence of sound patterns. Cambridge: Cambridge University Press.

Bodt, Timotheus Adrianus. 2014a. Ethnolinguistic survey of Westernmost Arunachal Pradesh. A fieldworker’s impressions. Linguistics of the Tibeto-Burman Area 37.2: 198–239.

. 2014b. Notes on the settlement of the Gongri River valley of Western Arunachal Pradesh. In Anna Balikci Denjongpa & Jenny Bentley (eds.), The dragon and the hidden land: Social and historical studies on Sikkim and Bhutan. Proceedings of the Bhutan-Sikkim Panel at the 13th Seminar of the International Association for Tibetan Studies, 153–190. Ulaanbataar: International Association for Tibetan Studies.

. 2019. The Duhumbi perspective on Proto-Western Kho-Bwa rhymes. Die Sprache 521 (2016 / 2017) 21: 141–176.

. 2021. The Duhumbi perspective on Proto-Western Kho-Bwa onsets. Historical Linguistics 11.1: 1–59.

Bodt, Timotheus A. & Johann-Mattis List. 2019. Testing the predictive strength of the comparative method: An ongoing experiment on unattested words in Western Kho- Bwa languages. Papers in Historical Phonology 4 (1): 22–44.

. 2020. The multiple benefits of making predictions in linguistics. Babel: The Language Magazine 311: 8–12.

Bodt, Timotheus A., Nathan W. Hill & Johann-Mattis List. 2018. Prediction experiment for missing words in Kho-Bwa language data. Open science framework preregistrations October 5. [URL]

Branner, David Prager. 2006. Some composite phonological systems in Chinese. In David Prager Branner (ed.), The Chinese rime tables: Linguistic philosophy and historical-comparative phonology, 209–232. Amsterdam: Benjamins.

Driem, George van. 2001. Languages of the Himalayas: An ethnolinguistic handbook of the Greater Himalayan Region. 21. Leiden: Brill.

Eberhard, David M., Gary F. Simons & Charles D. Fennig (eds.). 2019. Ethnologue: Languages of the world. Twenty-second edition. Dallas, Texas: SIL International. [URL]

Forkel, Robert, Johann-Mattis List, Simon J. Greenhill, Christoph Rzymski, Sebastian Bank, Michael Cysouw, Harald Hammarström, Martin Haspelmath, Gereon A. Kaiping & Russell D. Gray. 2018. Cross-linguistic data formats, advancing data sharing and re-use in comparative linguistics. Scientific Data 5 (180205): 1–10.

Genetti, Carol. 2016. The Tibeto-Burman languages of South Asia: The languages, histories, and genetic classification. In Hans Heinrich Hock & Elena Bashir (eds.), The languages and linguistics of South Asia: A comprehensive guide, 130–154. Berlin: Mouton de Gruyter.

Greenberg, Joseph H. 1963. Some universals of grammar with particular reference to the order of meaningful elements. In Joseph H. Greenberg, Universals of human language, 73–113. Cambridge, Mass: MIT Press.

Grimm, Jacob. 1822. Deutsche Grammatik. Erster Theil. Göttingen: Dieterich.

Hammarström, Harald, Robert Forkel & Martin Haspelmath. 2020. Glottolog. Version 4.2.1. Jena, Max Planck Institute for the Science of Human History. [URL]

Lieberherr, Ismael & Timotheus Adrianus Bodt. 2017. Sub-grouping Kho-Bwa based on shared core vocabulary. Himalayan Linguistics 16 (2): 25–63.

List, Johann-Mattis. 2017. A web-based interactive tool for creating, inspecting, editing, and publishing etymological datasets. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. System Demonstrations, 9–12.

. 2019. Automatic inference of sound correspondence patterns across multiple languages. Computational Linguistics 1 (45): 137–61.

Michael, Lev, Natalia Chousou-Polydouri, Keith Bartolomei, Erin Donnelly, Vivian Wauters, Sérgio Meira & Zachary O’Hagan. 2015. A Bayesian phylogenetic classification of Tupí-Guaraní. LIAMES 15 (2): 193–221.

Nosek, Brian, Emorie D. Beck, Lorne Campell, Jessica K. Flake, Tom E. Hardwicke, David T. Mellor, Anna E. van ‘t Veer & Simine Vazire. 2019. Preregistration is hard, and worthwhile. Trends in Cognitive Sciences 23(10): 815–818.

Post, Mark W. & Robbins Burling. 2017. The Tibeto-Burman languages of Northeastern India. In Graham Thurgood & Randy J. LaPolla (eds.), The Sino-Tibetan languages, 213–233. Abingdon: Routledge.

Schweikhard, N. & J.-M. List. 2020. Developing an annotation framework for word formation processes in comparative linguistics. SKASE Journal of Theoretical Linguistics 17.1: 2–26.

Sims-Williams, P. 2018. Mechanising historical phonology. Transactions of the Philological Society. 116.3: 555–573.

Watkins, C. 1962. Indo-European origins of the Celtic verb. Volume I. The sigmatic aorist. Dublin: Dublin Institute for Advanced Studies.

Wu, M.-S., N. Schweikhard, T. Bodt, N. Hill & J.-M. List. 2020. Computer-assisted language comparison: State of the art. Journal of Open Humanities Data 6.2: 1–14.

Cited by (12)

Cited by 12 other publications

Order by:

Blum, Frederic, Carlos Barrientos, Johannes Englisch, Robert Forkel, Simon J. Greenhill, Christoph Rzymski & Johann-Mattis List

2025. Lexibank 2: pre-computed features for large-scale lexical data. Open Research Europe 5 ► pp. 126 ff.

Blum, Frederic, Carlos Barrientos, Johannes Englisch, Robert Forkel, Simon J. Greenhill, Christoph Rzymski & Johann-Mattis List

2025. Lexibank 2: pre-computed features for large-scale lexical data. Open Research Europe 5 ► pp. 126 ff.

Wientzek, Tim

2025. Using feature vectors for automated phonological reconstruction and reflex prediction. Open Research Europe 5 ► pp. 174 ff.

Blum, Frederic, Carlos Barrientos, Adriano Ingunza & Johann-Mattis List

2024. Cognate reflex prediction as hypothesis test for a genealogical relation between the Panoan and Takanan language families. Scientific Reports 14:1

Lai, Yunfan

2024. Mutual predictiveness of sound correspondences for reconstruction and language subgrouping. Diachronica 41:5 ► pp. 635 ff.

List, Johann-Mattis

2023. Open Problems in Computational Historical Linguistics. Open Research Europe 3 ► pp. 201 ff.

List, Johann-Mattis

2023. Evolutionary Aspects of Language Change. In Evolutionary Thinking Across Disciplines [Synthese Library, 478], ► pp. 103 ff.

List, Johann-Mattis

2024. Open Problems in Computational Historical Linguistics. Open Research Europe 3 ► pp. 201 ff.

List, Johann-Mattis, Robert Forkel, Simon J. Greenhill, Christoph Rzymski, Johannes Englisch & Russell D. Gray

2022. Lexibank, a public repository of standardized wordlists with computed phonological and lexical features. Scientific Data 9:1

List, Johann-Mattis & Robert Forkel

2021. Automated identification of borrowings in multilingual wordlists. Open Research Europe 1 ► pp. 79 ff.

List, Johann-Mattis & Robert Forkel

2021. Automated identification of borrowings in multilingual wordlists. Open Research Europe 1 ► pp. 79 ff.

List, Johann-Mattis & Robert Forkel

2022. Automated identification of borrowings in multilingual wordlists. Open Research Europe 1 ► pp. 79 ff.

This list is based on CrossRef data as of 8 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.