In:Mathematical Modelling in Linguistics and Text Analysis: Theory and applications
Edited by Adam Pawłowski, Sheila Embleton, Jan Mačutek and Aris Xanthos
[Current Issues in Linguistic Theory 370] 2025
► pp. 6–16
Testing diachronic measures of productivity using the Zipf-Mandelbrot law
Published online: 13 October 2025
https://doi.org/10.1075/cilt.370.01fel
https://doi.org/10.1075/cilt.370.01fel
Abstract
The productivity of patterns (be it morphological or syntactic) has long been studied in diachrony.
However, the study of the relationship between productivity and token frequency has been hindered by the non-trivial
dependency of productivity on the sample size on which it is computed. Several solutions have been offered in the literature
but a systematic evaluation of them all has not been performed to this date. In this paper, we rely on the Zipf-Mandelbrot
distribution to generate synthetic data and perform a statistical evaluation of these methods. Our result is that many of
these methods are successful at providing a measure independent of the sample size, and that they are closely correlated.
Article outline
- 1.Introduction
- 2.Zipf’s law for constructions
- 2.1Empirical datasets
- 2.2Fitting the ZM distribution
- 2.3Applying the ZM fit to schematic constructions
- 3.Statistical evaluation of diachronic productivity measures
- 3.1Method
- 3.2Productivity measures
- 3.3Results
- 4.Discussion and limitations
References
References (26)
. 2009. Corpus
linguistics in morphology: Morphological
productivity. In Anke Lüdeling & Merja Kytö (eds.), Corpus
linguistics. An international
handbook, 900–919. Berlin: Mouton De Gruyter.
Barðdal, Jóhanna, Renata Enghels, Quentin Feltgen, Sven Van Hulle & Peter Lauwers. 2024. Productivity
in diachrony. In Adam Ledgeway, Edith Aldridge, Anne Breitbarth, Katalin É. Kiss, Joseph Salmons & Alexandra Simonenko (eds.), The
Wiley Blackwell companion to diachronic
linguistics. Hoboken: John Wiley & Sons, Inc.
Chipere, Ngoni, David Malvern & Brian Richards. 2004. Using
a corpus of children’s writing to test a solution to the sample size problem affecting type-token
ratios. In Guy Aston, Silvia Bernardini & Dominic Stewart (eds.), Corpora
and language
learners, 139–147. Amsterdam: John Benjamins.
Davies, Mark. 2008–. The
corpus of contemporary American English (COCA). Available online
at [URL]
. 2010. The
corpus of historical American English (COHA). Available online
at [URL]
Ellis, Nick C. & Fernando Ferreira-Junior. 2009. Construction
learning as a function of frequency, frequency distribution, and function. The Modern
Language
Journal 93(3). 370–385.
Evert, Stefan. 2004. A
simple LNRE model for random character sequences. In Gérald Purnelle, Cédrick Fairon & Anne Dister, Proceedings
of JADT
2004, 411–422. Louvain: Presses universitaires de Louvain.
Evert, Stefan & Marco Baroni. 2005. Testing
the extrapolation quality of word frequency
models. In Proceedings from the Corpus Linguistics Conference
Series, Vol. 1, no. 1. Online: [URL]
Flach, Susan. 2021. From
Movement into Action to Manner of Causation : Changes in argument mapping in the
into-causative. Linguistics 59(1). 247–283.
Gaeta, Livio & Davide Ricca. 2006. Productivity
in Italian word formation: A variable-corpus
approach. Linguistics 44(1). 57–89.
Hartmann, Stefan. 2018. Derivational
morphology in flux: A case study of word-formation change in German. Cognitive
Linguistics 29(1). 77–119.
Heaps, Harold S. 1978. Information retrieval:
Computational and theoretical aspects. New York: Academic Press, Inc.
Koplenig, Alexander. 2018. Using
the parameters of the Zipf-Mandelbrot law to measure diachronic lexical, syntactical and stylistic changes–a
large-scale corpus analysis. Corpus Linguistics and Linguistic
Theory 14(1). 1–34.
Lü, Linyuan, Zi-Ke Zhang & Tao Zhou. 2010. Zipf’s
law leads to Heaps’ law: Analyzing their relation in finite-size systems. PloS
one 5:12. e14139.
Lüdeling, Anke & Stefan Evert. 2005. The
emergence of productive non-medical -it is: Corpus evidence and qualitative
analysis. In Stephan Kepser & Marga Reis (eds.), Linguistic
evidence. Empirical, theoretical, and computational
perspectives, 351–370. Berlin: Mouton de Gruyter.
Pankratz, Elizabeth, Titus von der Malsburg & Shravan Vasishth. 2022. Shannon
entropy is a more comprehensive and principled morphological productivity measure than the standard
alternatives.
Perek, Florent. 2018. Recent
change in the productivity and schematicity of the way-construction: A distributional semantic
analysis. Corpus Linguistics and Linguistic
Theory 14(1). 65–97.
Säily, Tanja. 2016. Sociolinguistic
variation in morphological productivity in eighteenth-century English. Corpus
Linguistics and Linguistic
Theory 12(1). 129–151.
Sundquist, John D. 2020. Productivity, richness,
and diversity of light verb constructions in the history of American English. Journal
of Historical
Linguistics 10(3). 349–388.
Tang, Kevin & Andrew Nevins. 2013. Quantifying
the diachronic productivity of irregular verbal patterns in Romance. UCL Working Papers
in
Linguistics 25. 289–308.
Tunnicliffe, Martin & Gordon Hunter. 2021. The
predictive capabilities of mathematical models for the type-token relationship in English language
corpora. Computer Speech &
Language 70. 101227.
Valdeson, Fredrik. 2022. Lexical
variation in the double object construction in 19th and 20th century
Swedish. In Ida Larsson & Erik M. Petzell (eds.), Morphosyntactic
change in Late Modern
Swedish, 99–144. Berlin: Language Science Press.
Van Wettere, Niek. 2022. The
hapax/type ratio: An indicator of minimally required sample size in productivity
studies? International Journal of Corpus
Linguistics 27(2). 166–190.
