In:Historical Linguistics 2019: Selected papers from the 24th International Conference on Historical Linguistics, Canberra, 1–5 July 2019
Edited by Bethwyn Evans, Maria Kristina Gallego and Luisa Miceli
[Current Issues in Linguistic Theory 367] 2024
► pp. 74–108
Chapter 4Solving Galton’s problem
Practical solutions for analysing language diversity and evolution
Published online: 21 November 2024
https://doi.org/10.1075/cilt.367.04bro
https://doi.org/10.1075/cilt.367.04bro
Abstract
Comparisons between languages can illuminate processes of language change by revealing meaningful
associations between language features or the influence of external factors on the patterns and rates of language change. But
comparisons between languages raise statistical challenges, because close relatives will tend to be more similar to each
other, compared with more distantly related languages, and languages from the same areas will be subject to many of the same
influences. Therefore, observations made on different languages will usually fail to meet the requirement of statistical
independence inherent in standard statistical testing. This fundamental challenge of cross-cultural analysis, known as
Galton’s problem, is no cause for despair because there are a range of workable solutions using widely available data. This
paper discusses a range of practical solutions, including phylogenetic analysis, sister pair comparisons, and spatially
structured models, that can be applied to analyses of language variation and change.
Article outline
- 1.What is Galton’s problem?
- 2.Who has Galton’s problem?
- 3.Practical solutions to the problem of relatedness
- 3.1Solution 1: Use a phylogeny
- 3.2Solution 2: Use taxonomic information
- 3.3Solution 3: Sister pairs
- 4.Practical solutions to the problem of proximity
- 4.1Solution 1: Grid-based spatial data
- 4.2Solution 2: Distance between languages
- 5.Conclusions
Acknowledgements References
References (104)
Adelaar, Willem F. 2012. Modeling convergence:
Towards a reconstruction of the history of Quechuan–Aymaran interaction. Lingua 122(5). 461–469.
Amano, Tatsuya, Brody S. Sandel, Heidi Eager, Edouard Bulteau, Jens-Christian Svenning, Bo Dalsgaard, Carsten Rahbek, Richard G. Davies & William J. Sutherland. 2014. Global
distribution and drivers of language extinction risk. Proceedings of the Royal Society
B: Biological
Sciences 281(1793). 20141574.
Atkinson, Quentin D. & Russell D. Gray. 2005. Curious
parallels and curious connections — phylogenetic thinking in biology and historical
linguistics. Systematic
Biology 54(4). 513–526.
Barbieri, Chiara, Damián E. Blasi, Epifanía Arango-Isaza, Alexandros G. Sotiropoulos, Harald Hammarström, Søren Wichmann, Simon J. Greenhill et al. 2022. A global analysis of matches and mismatches between human genetic and linguistic histories. Proceedings of the National Academy of Sciences 119(47). e2122084119.
Bastide, Paul, Claudia Solís-Lemus, Ricardo Kriebel, K. William Sparks & Cécile Ané. 2018. Phylogenetic
comparative methods on phylogenetic networks with reticulations. Systematic
Biology 67(5). 800–820.
Berlin, Brent & Paul Kay. 1969. Basic
color terms: Their universality and evolution. Berkeley, CA: University of California Press.
Blomberg, Simon P., Theodore Garland,
Jr. & Anthony R. Ives. 2003. Testing
for phylogenetic signal in comparative data: Behavioral traits are more
labile. Evolution 57(4). 717–745.
Bouckaert, Remco, Philippe Lemey, Michael Dunn, Simon J. Greenhill, Alexander V. Alekseyenko, Alexei J. Drummond, Russell D. Gray, Marc A. Suchard & Quentin D. Atkinson. 2012. Mapping
the origins and expansion of the Indo-European language
family. Science 337(6097). 957–960.
Bouckaert, Remco R., Claire Bowern & Quentin D. Atkinson. 2018. The
origin and expansion of Pama–Nyungan languages across Australia. Nature Ecology &
Evolution 2(4). 741–749.
Bromham, Lindell. 2016. Testing
hypotheses in macroevolution. Studies in History and Philosophy of Science Part
A 55. 47–59.
. 2017. Curiously
the same: Swapping tools between linguistics and evolutionary biology. Biology &
Philosophy 32(6). 855–886.
. 2019. Six
impossible things before breakfast: Assumptions, models, and belief in molecular
dating. Trends in Ecology &
Evolution 34(5). 474–486.
. 2022. Meaning
and purpose: Using phylogenies to investigate human history and cultural
evolution. Biological
Theory, pp.1–19.
Bromham, Lindell & Keaghan J. Yaxley. 2023. Neighbours
and relatives: accounting for spatial distribution when testing causal hypotheses in cultural
evolution. Evolutionary Human
Sciences, 5: e27.
Bromham, Lindell, Sebastián Duchêne, Xia Hua, Andrew M. Ritchie, David A. Duchêne & Simon Y. W. Ho. 2018a. Bayesian
molecular dating: Opening up the black box. Biological
Reviews 93(2). 1165–1191.
Bromham, Lindell, Xia Hua, Marcel Cardillo, Hilde Schneemann & Simon J. Greenhill. 2018b. Parasites
and politics: Why cross-cultural studies must control for relatedness, proximity and
covariation. Royal Society Open
Science 5(8).181100.
Bromham, Lindell, Xia Hua, Thomas G. Fitzpatrick & Simon J. Greenhill. 2015. Rate
of language evolution is affected by population size. Proceedings of the National
Academy of Sciences
(PNAS) 112(7). 2097–2102.
Bromham, Lindell, Alexander Skeels, Hilde Schneemann, Russell Dinnage & Xia Hua. 2021. There
is little evidence that spicy food in hot countries is an adaptation to reducing infection
risk. Nature Human
Behaviour 5. no.7: 878–891.
Bryant, David, Flavia Filimon & Russell D. Gray. 2005. Untangling
our past: Languages, trees, splits and networks. In Ruth Mace, Clare J. Holden & Stephen Shennan (eds.), The
evolution of cultural diversity: A phylogenetic
approach, 77–93. Walnut Creek: Left Coast Press.
Bulmer, Michael. 2003. Francis
Galton: Pioneer of heredity and
biometry. Baltimore: The Johns Hopkins Press.
Cardillo, Marcel, Lindell Bromham & Simon J. Greenhill. 2015. Links
between language diversity and species richness can be confounded by spatial
autocorrelation. Royal Society B: Biological
Sciences 282. 20142986.
Chen, M. Keith. 2013. The effect of
language on economic behavior: Evidence from savings rates, health behaviors, and retirement
assets. American Economic
Review 103(2). 690–731.
Collard, Ian F. & Robert A. Foley. 2002. Latitudinal
patterns and environmental determinants of recent human cultural diversity: Do humans follow biogeographical
rules? Evolutionary Ecology
Research 4(3). 371–383.
Collins, Jeremy. 2016. Commentary:
The role of language contact in creating correlations between humidity and
tone. Journal of Language
Evolution 1(1). 46–52.
. 2017. Real
and spurious correlations involving tonal languages. In N. J. Enfield (ed.), Dependencies
in
language, 129–140. Berlin: Language Science Press.
Daneyko, Thora & Christian Bentz. 2019. Click
languages tend to have large consonant inventories: Implications for language evolution and
change. In Yonatan Sahle, Hugo Reyes-Centeno & Christian Bentz (eds.), Modern
Human Origins and
Dispersal, 315–329. Tübigen: Kerns Verlag.
Dediu, Dan. 2018. Making
genealogical language classifications available for phylogenetic analysis: Newick trees, unified identifiers, and
branch length. Language Dynamics and
Change 8(1). 1–21.
Dediu, Dan & D. Robert Ladd. 2007. Linguistic
tone is related to the population frequency of the adaptive haplogroups of two brain size genes, ASPM and
Microcephalin. National Academy of Sciences
(PNAS) 104(26). 10944–10949.
Denis, Daniel J. 2001. The
origins of correlation and regression: Francis Galton or Auguste Bravais and the error
theorists. History and Philosophy of Psychology
Bulletin 13(2). 36–44.
Dow, Malcolm M. & E. Anthon Eff. 2008. Global,
regional, and local network autocorrelation in the standard cross-cultural
sample. Cross-Cultural
Research 42(2). 148–171.
Dunn, Michael, Simon J. Greenhill, Stephen C. Levinson & Russell D. Gray. 2011. Evolved
structure of language shows lineage-specific trends in word-order
universals. Nature 473(7345). 79–82.
Dunn, Robert R., T. Jonathan Davies, Nyeema C. Harris & Michael C. Gavin. 2010. Global
drivers of human pathogen richness and prevalence. Royal Society B: Biological
Sciences 277(1694). 2587–2595.
Eberhard, David M., Gary F. Simons & Charles D. Fennig. 2019. Ethnologue:
Languages of the World, 22nd edn. ([URL]). Dallas, TX: SIL International.
Eff, Anthon. 2004. Does
Mr. Galton still have a problem? Autocorrelation in the standard cross-cultural
sample. World
Cultures 15(2). 153–170.
Everett, Caleb, Damián E. Blasí & Seán G. Roberts. 2016. Language
evolution and climate: The case of desiccation and tone. Journal of Language
Evolution 1(1). 33–46.
Felsenstein, Joseph. 1985. Phylogenies
and the comparative method. The American
Naturalist 125(1). 1–15.
Fincher, Corey L. & Randy Thornhill. 2008. A
parasite-driven wedge: Infectious diseases may explain language and other biodiversity. Oikos 117(9). 1289–1297.
Folyovich, András, Tamás Jarecsny, Dorottya Janoska, Eszter Dudas, Katalina Anna Beres-Molnar, Nóra Botos, Dávid Biczo & Gergely Toldi. 2019. Csokoládéfogyasztás
és a magyar Nobel-díjasok [Chocolate consumption and Hungarian Nobel laureates]. Orv
Hetil 160(1). 26–29.
Freckleton, Robert P. & Walter Jetz. 2008. Space
versus phylogeny: Disentangling phylogenetic and spatial signals in comparative
data. Proceedings of the Royal Society B: Biological
Sciences 276(1654). 21–30.
Galton, Francis. 1889a. Comment
on ‘On a Method of Investigating the Development of Institutions; Applied to Laws of Marriage and Descent’ by E. B.
Tylor. The Journal of the Anthropological Institute of Great Britain and
Ireland 18. 245–272.
. 1889b. I.
Co-relations and their measurement, chiefly from anthropometric data. Proceedings of
the Royal Society of
London 45(273–279). 135–145.
Gray, Russell D., Alexei J. Drummond & Simon J. Greenhill. 2009. Language
phylogenies reveal expansion pulses and pauses in Pacific
settlement. Science 323(5913). 479–483.
Greenhill, Simon J. 2018. Treemaker: A Python tool
for constructing a Newick formatted tree from a set of classifications. The Journal of
Open Source Software (JOSS) 3(31).
Greenhill, Simon J., Quentin D. Atkinson, Andrew Meade & Russell D. Gray. 2010. The
shape and tempo of language evolution. Proceedings of the Royal Society B: Biological
Sciences 277(1693). 2443–2450.
Greenhill, Simon J., Robert Blust & Russell D. Gray. 2008. The
Austronesian basic vocabulary database: From bioinformatics to lexomics. Evolutionary
Bioinformatics Online 4. 271.
Greenhill, Simon J. & Russell D. Gray. 2009. Austronesian
language phylogenies: Myths and misconceptions about Bayesian computational
methods. In Alexander Adelaar and Andrew Pawley (eds.), Austronesian
historical linguistics and culture history: A festschrift for Robert
Blust. Canberra: Pacific Linguistics. 375–397.
Greenhill, Simon J., Xia Hua, Caela F. Welsh, Hilde Schneemann & Lindell Bromham. 2018. Population
size and the rate of language evolution: A test across Indo-European, Austronesian, and Bantu
languages. Frontiers in
Psychology 9. 576.
Grollemund, Rebecca, Simon Branford, Koen Bostoen, Andrew Meade, Chris Venditti & Mark Pagel. 2015. Bantu
expansion shows that habitat alters the route and pace of human dispersals. National
Academy of Sciences
(PNAS) 112(43). 13296–13301.
Guernier, Vanina, Michael E. Hochberg & Jean-François Guégan. 2004. Ecology
drives the worldwide distribution of human diseases. PLoS
Biology 2(6). e141.
Hammarström, Harald, Robert Forkel & Martin Haspelmath. 2019. Glottolog
4.0. ([URL]). Jena: Max Planck Institute for the Science of Human History.
Harvey, Paul H. & Mark D. Pagel. 1991. The
comparative method in evolutionary
biology. Oxford: Oxford University Press.
Harvey, Paul H. & Andy Purvis. 1991. Comparative
methods for explaining
adaptations. Nature 351(6328). 619–624.
Haynie, Hannah J. & Claire Bowern. 2016. Phylogenetic
approach to the evolution of color term systems. National Academy of Sciences
(PNAS) 113(48). 13666–13671.
Höhna, Sebastian, Michael J. Landis, Tracy A. Heath, Bastien Boussau, Nicolas Lartillot, Brian R. Moore, John P. Huelsenbeck & Fredrik Ronquist. 2016. RevBayes:
Bayesian phylogenetic inference using graphical models and an interactive model-specification
language. Systematic
Biology 65(4). 726–736.
Hua, Xia, Simon J. Greenhill, Marcel Cardillo, Hilde Schneemann & Lindell Bromham. 2019. The
ecological drivers of variation in global language diversity. Nature
Communications 10(1). 2047.
Huey, Raymond B., Theodore Garland,
Jr. & Michael Turelli. 2019. Revisiting
a key innovation in evolutionary biology: Felsenstein’s “Phylogenies and the comparative
method”. The American
Naturalist 193(6). 755–772.
Jaeger, T. Florian, Peter Graff, William Croft & Daniel Pontillo. 2011. Mixed
effect models for genetic and areal dependencies in linguistic typology. Linguistic
Typology 15(2). 281.
Jetz, Walter, Gavin H. Thomas, Jeffery B. Joy, Klaas Hartmann & Arne O. Mooers. 2012. The
global diversity of birds in space and
time. Nature 491(7424). 444–448.
Jordan, Fiona M. 2011. A
phylogenetic analysis of the evolution of Austronesian sibling terminologies. Human
Biology 83(2). 297–321.
Kitchen, Andrew, Christopher Ehret, Shiferaw Assefa & Connie J. Mulligan. 2009. Bayesian
phylogenetic analysis of Semitic languages identifies an Early Bronze Age origin of Semitic in the Near
East. Proceedings of the Royal Society B: Biological
Sciences 276(1668). 2703–2710.
Koplenig, Alexander. 2019. Language
structure is influenced by the number of speakers but seemingly not by the proportion of non-native
speakers. Royal Society Open
Science 6(2). 181274.
Kummu, Matti & Olli Varis. 2011. The
world by latitudes: A global analysis of human population, development level and environment across the north–south
axis over the past half century. Applied
Geography 31(2). 495–507.
Lanfear, Robert, John J. Welch & Lindell Bromham. 2010. Watching
the clock: Studying variation in rates of molecular evolution. Trends in Ecology &
Evolution 25(9). 495–503.
Lee, Sean & Toshikazu Hasegawa. 2011. Bayesian
phylogenetic analysis supports an agricultural origin of Japonic languages. Proceedings
of the Royal Society B: Biological
Sciences 278(1725). 3662–3669.
Levinson, Stephen C. & Russell D. Gray. 2012. Tools
from evolutionary biology shed new light on the diversification of languages. Trends in
Cognitive
Sciences 16(3). 167–173.
Lupyan, Gary & Rick Dale. 2010. Language
structure is partly determined by social structure. PLoS
ONE 5(1):e8559.
Mace, Ruth & Mark Pagel. 1994. The
comparative method in anthropology. Current
Anthropology 35(5). 549–564.
. 1995. A
latitudinal gradient in the density of human languages in North America. Proceedings of
the Royal Society of London B: Biological
Sciences 261(1360). 117–121.
Maddieson, Ian. 2013a. Absence
of Common Consonants ([URL]). In Matthew S. Dryer & Martin Haspelmath (eds.), The
world atlas of language structures online
(WALS). Leipzig: Max Planck Institute for Evolutionary Anthropology.
. 2013b. Tone ([URL]). In Matthew S. Dryer & Martin Haspelmath (eds.), The
world atlas of language structures
online. Leipzig: Max Planck Institute for Evolutionary Anthropology.
Martins, Emilia P. 1996. Conducting
phylogenetic comparative studies when the phylogeny is not
known. Evolution 50(1). 12–22.
Maurage, Pierre, Alexandre Heeren & Mauro Pesenti. 2013. Does
chocolate consumption really boost Nobel Award chances? The peril of over-interpreting correlations in health
studies. The Journal of
Nutrition 143(6). 931–933.
Messerli, Franz H. 2012. Chocolate
consumption, cognitive function, and Nobel laureates. New England Journal of
Medicine 367(16). 1562–1564.
Murdock, George P. & Douglas R. White. 1969. A
standard cross-cultural
sample. Ethnology 8(4). 329–369.
Naroll, Raoul. 1965. Galton’s
problem: The logic of cross-cultural analysis. Social
Research 32(4). 428–451.
Nelson-Sathi, Shijulal, Johann-Mattis List, Hans Geisler, Heiner Fangerau, Russell D. Gray, Willam Martin & Tal Dagan. 2011. Networks
uncover hidden lexical borrowing in Indo-European language evolution. Proceedings of
the Royal Society
B 278(1713). 1794–1803.
Nettle, Daniel. 1998. Explaining
global patterns of language diversity. Journal of Anthropological
Archaeology 17(4). 354–374.
Nunn, Charles L., Sonia M. Altizer, Wes Sechrest & Andrew A. Cunningham. 2005. Latitudinal
gradients of parasite species richness in primates. Diversity and
Distributions 11(3). 249–256.
Orme, David, Rob Freckleton, Gavin Thomas, Thomas Petzoldt, Susanne Fritz & Nick Isaac. 2013. The
caper package: Comparative analysis of phylogenetics and evolution in R. R Package
Version 5(2). 1–36.
Paradis, Emmanuel. 2014. An
introduction to the phylogenetic comparative method. In László Zsolt Garamszegi (ed.), Modern
phylogenetic comparative methods and their application in evolutionary
biology, 3–18. Berlin/Heidelberg: Springer-Verlag.
Poulin, Robert. 2014. Parasite
biodiversity revisited: Frontiers and constraints. International Journal for
Parasitology 44(9). 581–589.
Purvis, Andy & Lindell Bromham. 1997. Estimating
the transition/transversion ratio from independent pairwise comparisons with an assumed
phylogeny. Journal of Molecular
Evolution 44(1). 112–119.
Rankin, Robert L. 2003. The comparative
method. In Brian D. Joseph and Richard D. Janda (eds.), The
handbook of historical
linguistics, 183–212. Oxford: Blackwell.
Raviv, Limor, Antje Meyer & Shiri Lev-Ari. 2019. Larger
communities create more systematic languages. Proceedings of the Royal Society B:
Biological
Sciences 286(1907). 20191262.
Roberts, Seán & James Winters. 2013. Linguistic
diversity and traffic accidents: Lessons from statistical studies of cultural
traits. PLOS
ONE 8(8). e70902.
Roberts, Seán G., James Winters & Keith Chen. 2015. Future
tense and economic decisions: Controlling for cultural evolution. PLOS
ONE 10(7). e0132145.
Shipley, Bill. 2002. Cause
and correlation in biology: A user’s guide to path analysis, structural equations and causal
inference, 1st
edn. Cambridge: Cambridge University Press.
Sokolov, Alexander N., Marina A. Pavlova, Sibylle Klosterhalfen & Paul Enck. 2013. Chocolate
and the brain: Neurobiological impact of cocoa flavanols on cognition and
behavior. Neuroscience & Biobehavioral
Reviews 37(10, Part
2). 2445–2453.
Stanton, Jeffrey M. 2001. Galton,
Pearson, and the peas: A brief history of linear regression for statistics
instructors. Journal of Statistics
Education 9(3).
Symonds, Matthew R. E. & Simon P. Blomberg. 2014. A
primer on phylogenetic least squares regression. In László Zsolt Garamszegi (ed.), Modern
phylogenetic comparative methods and their application in evolutionary
biology, 105–130. Berlin/Heidelberg: Springer-Verlag.
Tajima, Fumio. 1993. Simple
methods for testing the molecular evolutionary clock
hypothesis. Genetics 135(2). 599–607.
Tishkoff, Sarah A., Mary Katherine Gonder, Brenna M. Henn, Holly Mortensen, Alec Knight, Christoper Gignoux, Neil Fernandopulle, Godfrey Lema, Thomas B. Nyambo, Uma Ramakrishnan, Floyd A. Reed & Joanna L. Mountain. 2007. History
of click-speaking populations of Africa inferred from mtDNA and Y chromosome genetic
variation. Molecular Biology and
Evolution 24(10). 2180–2195.
Turvey, Samuel T. & Nathalie Pettorelli. 2014. Spatial
congruence in language and species richness but not threat in the world’s top linguistic
hotspot. Proceedings of the Royal Society B: Biological
Sciences 281(1796). 20141644.
Tylor, Edward B. 1889. On
a method of investigating the development of institutions; Applied to laws of marriage and
descent. The Journal of the Anthropological Institute of Great Britain and
Ireland 18. 245–272.
Webb, Campbell O. & Michael J. Donoghue. 2005. Phylomatic:
Tree assembly for applied phylogenetics. Molecular Ecology
Notes 5(1). 181–183.
Weir, Jason T. & David Wheatcroft. 2010. A
latitudinal gradient in rates of evolution of avian syllable diversity and song
length. Proceedings of the Royal Society B: Biological
Sciences 278(1712). 1713–1720.
Welch, John J. & David Waxman. 2008. Calculating
independent contrasts for the comparative study of substitution rates. Journal of
Theoretical
Biology 251(4). 667–678.
Willems, Matthieu, Etienne Lord, Louise Laforest, Gilbert Labelle, François-Joseph Lapointe, Anna Maria Di Sciullo & Vladimir Makarenkov. 2016. Using
hybridization networks to retrace the evolution of Indo-European languages. BMC
Evolutionary
Biology 16(1). 180.
Zanne, Amy E., David C. Tank, William K. Cornwell, Jonathan M. Eastman, Stephen A. Smith, Richard G. Fitzjohn, Daniel J. McGlinn, Brian C. O’Meara, Angela T. Moles, Peter B. Reich, Dana L. Royer, Douglas E. Soltis, Peter F. Stevens, Mark Westoby, Ian J. Wright, Lonnie Aarssen, Robert I. Bertin, Andre Calaminus, Rafaël Govaerts, Frank Hemmings, Michelle R. Leishman, Jacek Oleksyn, Pamela S. Soltis, Nathan G. Swenson, Laura Warman & Jeremy M. Beaulieu. 2014. Three
keys to the radiation of angiosperms into freezing
environments. Nature 506. 89–92.
