Evidence of Zipfian distributions in three sign languages
Published online: 2 April 2024
https://doi.org/10.1075/gest.23014.kim
https://doi.org/10.1075/gest.23014.kim
One striking commonality between languages is their Zipfian distributions: A power-law distribution of word
frequency. This distribution is found across languages, speech genres, and within different parts of speech. The recurrence of
such distributions is thought to reflect cognitive and/or communicative pressures and to facilitate language learning. However,
research on Zipfian distributions has mostly been limited to spoken languages. In this study, we ask whether Zipfian distributions
are also found across signed languages, as expected if they reflect a universal property of human language. We find that sign
frequencies and ranks in three sign language corpora (BSL, DGS and NGT) show a Zipfian relationship, similar to that found in
spoken languages. These findings highlight the commonalities between spoken and signed languages, add to our understanding of the
use of signs, and show the prevalence of Zipfian distributions across language modalities, supporting the idea that they
facilitate language learning and communication.
Article outline
- Introduction
- Lexical frequency in sign languages
- Method
- Corpora
- The British Sign Language (BSL) corpus project
- The DGS-korpus project
- Corpus NGT
- Coding
- ELAN
- Exclusion and inclusion criteria for sign categories
- Sign categories included
- Fully lexical signs (core lexicon)
- Depicting constructions
- Pointing signs
- Buoys
- Gestures
- Excluded sign categories
- Uncertain signs
- Mouthing
- Extra-linguistic manual activity
- Fingerspelling
- Names
- Cued speech and initializations
- Collapsing over specific tokens within a sign type
- Creating a frequency distribution and assessing the fit to a “Zipfian” one
- Corpora
- Results
- What do the most frequent signs look like across the three corpora?
- Discussion
- Conclusions
- Acknowledgements
- Data and code availability
- Notes
References
References (92)
Bank, R., Crasborn, O., & van Hout, R. (2016). The
prominence of spoken language elements in a sign
language. Linguistics, 54(6), 1281–1305.
Bentz, C., Alikaniotis, D., Samardžić, T., & Buttery, P. (2017). Variation
in word frequency distributions: Definitions, measures and implications for a corpus-based language
typology. Journal of Quantitative
Linguistics, 24(2–3), 128–162.
Blasi, D. E., Michaelis, S. M., & Haspelmath, M. (2017). Grammars
are robustly transmitted even during the emergence of creole languages. Nature Human
Behaviour, 1(10), 723–729.
Borstell, C. (2022). Searching
and utilizing corpora [Review of Searching and utilizing
corpora]. In J. Fenlon & J. A. Hochgesang (Eds.), Signed
Language
Corpora, pp. 115–118. Gallaudet University Press.
Brennan, M. (1982). An
introduction to the visual world of BSL. In D. Brien (Ed.), Dictionary
of British Sign
Language/English, pp. 1–133. Faber & Faber.
(2006). Effects
of language modality on word segmentation: An experimental study of phonological factors in a sign
language. In S. Anderson, L. Goldstein, & C. Best (Eds.). Papers
in laboratory
phonology (Vol. 81), pp. 155–164. De Gruyter Mouton.
Brentari, D., & Goldin-Meadow, S. (2017). Language
emergence. Annual review of
linguistics, 31, 363–388.
Brentari, D., & Padden, C. A. (2001). Native
and foreign vocabulary in American Sign Language: A lexicon with multiple
origins. In D. Brentari (Ed.), Foreign
vocabulary in sign languages: A cross-linguistic investigation of word
formation, pp. 87–119. Lawrence Erlbaum.
Caselli, N. K., & Pyers, J. E. (2017). The
road to language learning is not entirely iconic: Iconicity, neighborhood density, and frequency facilitate acquisition of
sign language. Psychological
Science, 28(7), 979–987.
Caselli, N., Sevcikova Sehyr, Z., Cohen-Goldberg, A. M., & Emmorey, K. (2017). ASL-LEX:
A lexical database of American Sign Language. Behavior Research
Methods, 49(2), 784–801.
Chater, N., & Brown, G. D. (1999). Scale-invariance
as a unifying psychological
principle. Cognition, 69(3), B17–B24.
Christiansen, M. H., & Chater, N. (2008). Language
as shaped by the brain. The Behavioral and Brain
Sciences, 31(5), 489–508; discussion 509–558.
Clauset, A., Shalizi, C. R., & Newman, M. E. J. (2009). Power-law
distributions in empirical data. SIAM
Review, 51(4), 661–703.
Clerkin, E. M., Hart, E., Rehg, J. M., Yu, C., & Smith, L. B. (2017). Real-world
visual statistics and infants’ first-learned object names. Philosophical Transactions of the
Royal Society B: Biological
Sciences, 372 (1711), 1–8.
Cooperrider, K., Abner, N., & Goldin-Meadow, S. (2018). The
palm-up puzzle: Meanings and origins of a widespread form in gesture and sign. Frontiers in
Communication, 31, 1–14.
Cormier, K., Fenlon, J., Gulamani, S., & Smith, S. (2017). BSL
corpus annotation conventions. Annotation
Convention, Vol. 31, 2–15.
Cormier, K., Quinto-Pozos, D., Sevcikova, Z., & Schembri, A. (2012). Lexicalisation
and de-lexicalisation processes in sign languages: Comparing depicting constructions and viewpoint
gestures. Language &
Communication, 32(4), 329–348.
Coupé, C., Oh, Y., Dediu, D., & Pellegrino, F. (2019). Different
languages, similar encoding efficiency: Comparable information rates across the human communicative
niche. Science
Advances, 5(9), eaaw2594.
Crasborn, O. & Zwitserlood, I. (2008). The
Corpus NGT: An online corpus for professionals and laymen, In O. Crasborn, T. Hanke, E. Efthimiou, I. Zwitserlood & E. Thoutenhoofd (eds.), Construction
and exploitation of Sign Language corpora. 3rd Workshop on the Representation and Processing of Sign
Languages, pp. 44–49. ELDA.
Crasborn, O., Bank, R., Zwitserlood, I., Van Der Kooij, E., De Meijer, A., Sáfár, A., & Ormel, E. (2015). Annotation
conventions for the Corpus NGT, version 3. Centre for Language Studies & Department of Linguistics, Radboud University Nijmegen.
Crasborn, O., Sloetjes, H. (2008). Enhanced
ELAN functionality for sign language corpora. In: 6th International
Conference on Language Resources and Evaluation (LREC 2008)/3rd Workshop on the Representation and Processing of Sign
Languages: Construction and Exploitation of Sign Language
Corpora, pp. 39–43.
Crasborn, O., Zwitserlood, I. & Ros, J. (2008). The
Corpus NGT. An open access digital corpus of movies with annotations of Sign Language of the
Netherlands. Centre for Language Studies, Radboud University Nijmegen. Available at [URL] (last
access 12 March
2024). ISLRN: [URL] (last access 13 March 2024).
De Vos, C. (2012). Sign-spatiality
in Kata Kolok: How a village sign language in Bali inscribes its signing space [Doctoral
dissertation, Radboud University Nijmegen].
Diessel, H. (2007). Frequency
effects in language acquisition, language use, and diachronic change. New Ideas in
Psychology, 25(2), 108–127.
Ellis, N. C. (2002). Frequency
effects in language processing: A review with implications for theories of implicit and explicit language
acquisition. Studies in second language
acquisition, 24(2), 143–188.
Emmorey, K. (2001). Language,
cognition, and the brain: Insights from sign language research. Psychology Press.
Erting, C. J., Prezioso, C., & O’Grady Hynes, M. (1990). The
interactional context of deaf mother-infant communication. In From
gesture to language in hearing and deaf
children, pp. 97–106. Springer Verlag.
Fenlon, J., Cormier, K., & Schembri, A. (2015a). Building
BSL SignBank: The lemma dilemma revisited. International Journal of
Lexicography, 28(2), 169–206.
Fenlon, J., Schembri, A., Johnston, T., & Cormier, K. (2015b). Documentary
and corpus approaches to sign language research. Research methods in sign language studies: A
practical
guide, pp. 156–172. Wiley-Blackwell.
Fenlon, J., Schembri, A., Rentelis, R., Vinson, D., & Cormier, K. (2014a). Using
conversational data to determine lexical frequency in British Sign Language: The influence of text
type. Lingua, 1431, 187–202.
Fenlon, Jordan, Kearsy Cormier, Ramas Rentelis, Adam Schembri, Katherine Rowley, Robert Adam, & Bencie Woll. (2014b). BSL
SignBank: A lexical database of British Sign Language (1st
edn). London: Deafness, Cognition and Language Research Centre, University College London.
Ferrer-i-Cancho, R. & Solé, R. V. (2003). Least
effort and the origins of scaling in human language. Proceedings of the National Academy of
Sciences, 100(3), 788–791.
Ferrer-i-Cancho, R. (2016). Compression
and the origins of Zipf’s law for word
frequencies. Complexity, 21(S2), 409–411.
Gibson, E., Futrell, R., Piantadosi, S. T., Dautriche, I., Bergen, L., & Levy, R. (2019). How
efficiency shapes human language. Trends in Cognitive
Sciences, 23(5), 389–407.
Goldberg, A. E., Casenhiser, D. M., & Sethuraman, N. (2004). Learning
argument structure generalizations. Cognitive
Linguistics, 15(3), 289–316.
Hendrickson, A. T., & Perfors, A. (2019). Cross-situational
learning in a Zipfian
environment, Cognition 1891, 11–22.
Holzrichter, A. S., & Meier, R. P. (2000). Child-directed
signing in American sign language. In C. Chamberlain, J. P. Morford, & R. I. Mayberry (Eds.), Language
acquisition by
eye, pp. 25–40. Lawrence Erlbaum.
Johnston, T. (2012). Lexical
frequency in sign languages. Journal Of Deaf Studies And Deaf
Education, 17(2), 163–193.
Johnston, T., & Schembri, A. (2007). Australian
Sign Language (Auslan): An introduction to sign language linguistics. Cambridge University Press.
Johnston, T. (2010). From
archive to corpus: transcription and annotation in the creation of signed language
corpora. International Journal of Corpus
Linguistics, 151, 10–131.
Konrad, R., Hanke, T., Langer, G., Blanck, D., Bleicken, J., Hofmann, I., Jeziorski, O., König, L., König, S., Nishio, R., Regen, A., Salden, U., Wagner, S., Worseck, S., Böse, O., Jahn, E., Schulder, M. (2020a). MEINE
DGS – annotiert. Öffentliches Korpus der Deutschen Gebärdensprache, 3. Release / MY DGS – annotated. Public Corpus of German
Sign Language, 3rd release [Dataset]. Hamburg University.
Konrad, R., Hanke, T., Langer, G., König, S., König, L., Nishio, R., and Regen, A. (2020b). Öffentliches
DGS-Korpus: Annotationskonventionen / Public DGS Corpus: Annotation conventions. Project Note
AP03-2018-01, DGS-Korpus project, IDGS, Hamburg University.
Kurumada, C., Meylan, S. C., & Frank, M. C. (2013). Zipfian
frequency distributions facilitate word segmentation in
context. Cognition, 127(3), 439–453.
Langer, G., Müller, A., & Wähl, S. (2018). Queries
and Views in iLex to Support Corpus-based Lexicographic Work on German Sign Language
(DGS). In M. Bono, E. Efthimiou, S. E. Fotinea, T. Hanke, J. Hochgesang, J. Kristoffersen, J. Mesch & Y. Osugi (eds.) Involving
the Language Community. Proceedings of the 8th Workshop on the Representation and Processing of Sign Language. 11th
International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki,
Japan, pp. 107–114. ELRA.
Lavi-Rotbain, O., & Arnon, I. (2019). Children
learn words better in low entropy. Proceedings of the 41thth Annual Conference of the Cognitive
Science
Society, pp. 631–637. Cognitive Science Society.
(2020). The
learnability consequences of Zipfian distributions: Word segmentation is facilitated in more predictable
distributions. PsyArXiv. [preprint MS,
pp. 1–17]
(2021). Visual
statistical learning is facilitated in Zipfian
distributions. Cognition, 2061, 1044921, 1–8.
(2022). The
learnability consequences of Zipfian distributions in
language. Cognition, 2231, 1050381, 1–14.
Liddell, S. K. (2003). Grammar,
gesture and meaning in American Sign Language. Cambridge University Press, Cambridge.
Lillo-Martin, D. C., & Gajewski, J. (2014). One
grammar or two? Sign Languages and the nature of human language. Wiley Interdisciplinary
Reviews: Cognitive
Science, 5(4), 387–401.
Lillo-Martin, D., & Klima, E. S. (1990). Pointing
out differences: ASL pronouns in syntactic theory. Theoretical Issues in Sign Language
Research, 11, 191–210
Linders, G. M., & Louwerse, M. M. (2020). Zipf’s
law in human-machine dialog. Proceedings of the 20th ACM International Conference on
Intelligent Virtual
Agents, pp. 1–8. Association for Computing Machinery.
Mandelbrot, B. (1953). An
informational theory of the statistical structure of language. Communication
Theory, 21, 486–502.
Manin, D. (2008). Zipf’s
law and avoidance of excessive synonymy. Cognitive
Science, 32(7), 1075–1098.
Masataka, N., Morford, J., & Mayberry, R. (2000). The
role of modality and input in the earliest stage of language acquisition: Studies of Japanese Sign
Language. In Chamberlain, C., Morford, J. P., & Mayberry, R. (Eds.), Language
acquisition by
eye, pp. 3–24. Lawrence Erlbaum.
McDonald, B. H. (1985). Productive
and frozen lexicon in ASL: An old problem revisited. In W. Stokoe & V. Volterra (Eds.), SLR
’83: Proceedings of the 3rd International Symposium on Sign Language
Research, pp. 254–259. CNR & Linstok Press.
McKee, D., & Kennedy, G. D. (2006). The
distribution of signs in New Zealand Sign Language. Sign Language
Studies, 6(4), 372–390.
Mehri, A., & Jamaati, M. (2017). Variation
of Zipf’s exponent in one hundred live languages: A study of the Holy Bible
translations. Physics Letters, Section A: General, Atomic and Solid State
Physics, 381(31), 2470–2477.
Meier, R. (1990). Person
deixis in ASL. In S. Fischer & P. Siple (Eds.), Theoretical
issues in sign language
research, Vol. 11, pp. 175–190. University of Chicago Press.
Meir, I., & Sandler, W. (2007). A
language in space: the story of israeli sign language. Psychology Press.
Meir, I., Sandler, W., Padden, C., & Aronoff, M. (2010). Emerging
sign languages. In M. Marschark & P. Spencer (Eds.), Oxford
handbook of deaf studies, language, and
education, Vol. 21, pp. 267–280. Oxford University Press.
Morford, J. P., & MacFarlane, J. (2003). Frequency
Characteristics of American Sign Language. Sign Language
Studies, 3(2), 213–225.
Newman, M. E. J. (2005). Power
laws, Pareto distributions and Zipf’s law. Contemporary
Physics, 46(5), 323–351.
Novogrodsky, R., & Meir, N. (2020). Age,
frequency, and iconicity in early sign language acquisition: Evidence from the Israeli Sign Language MacArthur–Bates
Communicative Developmental Inventory. Applied
Psycholinguistics, 41(4), 817–845.
Orfanidou, E., Adam, R., Morgan, G., & McQueen, J. M. (2010). Recognition
of signed and spoken language: Different sensory inputs, the same segmentation
procedure. Journal of Memory and
Language, 62(3), 272–283.
Orfanidou, E., McQueen, J. M., Adam, R., & Morgan, G. (2015). Segmentation
of British Sign Language (BSL): Mind the gap! Quarterly Journal of Experimental
Psychology, 68(4), 641–663.
Perlman, M., Little, H., Thompson, B., & Thompson, R. L. (2018). Iconicity
in signed and spoken vocabulary: a comparison between American Sign Language, British Sign Language, English, and
Spanish. Frontiers in
psychology, 9, 1433, pp. 2–14.
Piantadosi, S. T. (2014). Zipf’s
word frequency law in natural language: A critical review and future directions. Psychonomic
Bulletin &
Review, 21(5), 1112–1130.
Sandler, W., & Lillo-Martin, D. (2001). Natural
sign languages. In M. Aronoff and J. Rees-Miller (Eds.), Handbook
of
linguistics, pp. 533–562. Blackwell.
Sandler, Wendy. (2016). What
comes first in language emergence? In N. Enfield (Ed.) Dependency
in language: On the causal ontology of language systems (Studies in Diversity in Linguistics
99), pp. 67–86. Language Science Press.
Schembri, Adam, Jordan Fenlon, Ramas Rentelis, & Kearsy Cormier. (2017). British
Sign Language Corpus Project: A corpus of digital video data and annotations of British Sign Language
2008–2017 (3rd edn). University College London. Available at [URL] (last access 12 March 2024).
Schick, B. S. (1987). The
acquisition of classifier predicates in American Sign Language. [Doctoral
Dissertation, Purdue University Indiana].
Schuler, K. D., Reeder, P. A., Newport, E. L., & Aslin, R. N. (2017). The
Effect of Zipfian Frequency Variations on Category Formation in Adult Artificial Language
Learning. Language Learning and
Development, 13(4), 357–374.
Sehyr, Z. S., Caselli, N., Cohen-Goldberg, A. M., & Emmorey, K. (2021). The
ASL-LEX 2.0 Project: A database of lexical and phonological properties for 2,723 Signs in American Sign
Language. The Journal of Deaf Studies and Deaf
Education, 26(2), 263–277.
Semple, S., Ferrer-i-Cancho, R., & Gustison, M. L. (2022). Linguistic
laws in biology. Trends in Ecology and
Evolution, 37(1), 53–66.
Senghas, A., & Coppola, M. (2001). Children
creating language: How Nicaraguan Sign Language acquired a spatial grammar. Psychological
science, 12(4), 323–328.
Shufaniya, A., & Arnon, I. (2022). A
cognitive bias for Zipfian distributions? Uniform distributions become more skewed via cultural
transmission. Journal of Language
Evolution, 7(1), 59–80.
Smith, R. G., & Hofmann, M. (2020). Lexical
frequency analysis of Irish Sign Language. TEANGA, the Journal of the Irish Association for
Applied
Linguistics, 111, 18–47.
Stamp, R., Ohanin, O. & Lanesman, S. (2022). The
Corpus of Israeli Sign Language. Conference Proceedings (LREC): Language Resources (LRs) and
Evaluation for Human Language Technologies
(HLT), pp. 192–197. ELRA.
Sümer, B., Grabitz, C., & Küntay, A. (2017). Early
produced signs are iconic: Evidence from Turkish Sign
Language. In The 39th Annual Conference of the Cognitive Science
Society (CogSci
2017), pp. 3273–3278. Cognitive Science Society.
Supalla, T. (1982). Structure
and acquisition of verbs of motion and location in American Sign Language. [Ph.D.
dissertation, University of California at San Diego].
Talmy, L. (2001, June). Spatial
structuring in spoken and signed language. Annual Meeting of the Berkeley Linguistics
Society, 27(1), pp. 271–300.
Woltz, D. J., Gardner, M. K., Kircher, J. C., & Burrow-Sanchez, J. J. (2012). Relationship
between perceived and actual frequency represented by common rating scale labels. Psychological
Assessment, 24(4), 995–1007.
Cited by (1)
Cited by one other publication
This list is based on CrossRef data as of 9 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
