Article published In: International Journal of Corpus Linguistics
Vol. 27:2 (2022) ► pp.166–190
The hapax / type ratio
An indicator of minimally required sample size in productivity studies?
Published online: 9 March 2022
https://doi.org/10.1075/ijcl.19114.van
https://doi.org/10.1075/ijcl.19114.van
Abstract
This article addresses one of the lesser-known productivity measures, namely the hapax / type ratio (HTR). Through a case study involving the Dutch semi-copula raken (“attain”), it is shown that the HTR more or less stabilizes from a certain sample size onwards. Moreover, this point of stabilization seems to coincide with an increased permanency of the hapaxes, i.e. the share of hapaxes that convert quickly to non-hapaxes is not as large as was the case at the beginning of the sampling process. Therefore, the stabilization of the HTR might be a good indicator of minimally required sample size in productivity studies, suggesting that the hapaxes are ‘non-incidental’ from this sample size onwards. However, I did not find a clear link between the onset of the stabilization of the HTR and the extent to which the inventory of types accounted for at the top of the frequency distribution is (quasi-)complete.
Article outline
- 1.Introduction
- 2.Quantitative measures gauging linguistic productivity
- 3.Focus on the hapax / type ratio
- 4.A sample-wide view on the shape of the hapax / type ratio
- 5.The case of the Dutch semi-copula raken and its hapax / type ratio
- 5.1Hapax stability
- 5.2Completeness of the frequency summit
- 6.Conclusions
- Acknowledgements
- Notes
References
References (25)
Baayen, R. H. (1992). Quantitative aspects of morphological productivity. In G. Booij & J. van Marle (Eds.), Yearbook of Morphology 1991 (pp. 109–149). Springer.
(2009). Corpus linguistics in morphology: Morphological productivity. In A. Lüdeling & M. Kytö (Eds.), Corpus Linguistics (pp. 899–919). De Gruyter Mouton.
Baayen, R. H., & Lieber, R. (1991). Productivity and English derivation: A corpus-based study. Linguistics, 29(5), 801–844.
Barðdal, J. (2008). Productivity: Evidence from Case and Argument Structure in Icelandic. John Benjamins.
Cvrček, V. (2011). How large is the core of language? In Proceedings of the Corpus Linguistics Conference 2011 (Paper#145). University of Birmingham. [URL]
Desagulier, G. (2016). A lesson from associative learning: Asymmetry and productivity in multiple-slot constructions. Corpus Linguistics and Linguistic Theory, 12(2), 173–219.
Evert, S. (2004). A simple LNRE model for random character sequences. In G. Purnelle, C. Fairon, & A. Dister (Eds.), Proceedings of JADT (pp. 411–422). Presses universitaires de Louvain.
Evert, S., & Baroni, M. (2006). Testing the extrapolation quality of word frequency models. In P. Danielsson & M. Wagenmakers (Eds.), Proceedings of Corpus Linguistics 2005. University of Birmingham. [URL]
Fan, F. (2010). An asymptotic model for the English hapax/vocabulary ratio. Computational Linguistics, 36(4), 631–637.
Goldberg, A. E. (1995). Constructions: A Construction Grammar Approach to Argument Structure. University of Chicago Press.
(2016). Partial productivity of linguistic constructions: Dynamic categorization and statistical preemption. Language and Cognition, 8(3), 369–390.
Hartmann, S. (2018). Derivational morphology in flux: A case study of word-formation change in German. Cognitive Linguistics, 29(1), 77–119.
Hilpert, M. (2013). Constructional Change in English: Developments in Allomorphy, Word Formation, and Syntax. Cambridge University Press.
Kemmer, S., & Barlow, M. (2000). Introduction: A usage-based conception of language. In S. Kemmer & M. Barlow (Eds.), Usage-Based Models of Language (pp. 7–28). CSLI Publications.
Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., Rychlý, P., & Suchomel, V. (2014). The Sketch Engine: Ten years on. Lexicography, 1(1), 7–36.
Lauwers, P., & Tobback, E. (2010). Les verbes attributifs: Inventaire(s) et statut(s) [Copular Verbs: Inventor(-y/-ies) and Status(es)]. Langages, 179–180(3), 79–113.
Perek, F. (2015). Argument Structure in Usage-based Construction Grammar: Experimental and Corpus-based Perspectives. John Benjamins.
(2016). Using distributional semantics to study syntactic productivity in diachrony: A case study. Linguistics, 54(1), 149–188.
Suttle, L., & Goldberg, A. E. (2011). The partial productivity of constructions as induction. Linguistics, 49(6), 1237–1269.
Van Eynde, F. (2015). Predicative Constructions: A Monostratal Montagovian Treatment. CSLI Publications.
Van Wettere, N. (2018). Copularité et Productivité: Une Analyse Contrastive des Verbes Attributifs Issus de Verbes de Mouvement en Français et en Néerlandais [Copularity and Productivity: A Contrastive Analysis of Copular Verbs Originating from Motion Verbs in French and Dutch] [Doctoral dissertation, Ghent University]. Academic Bibliography @ Ghent University. [URL]
(2021). Productivity of French and Dutch (semi-)copular constructions and the adverse impact of high token frequency. International Journal of Corpus Linguistics, 26(3), 396–428.
Cited by (5)
Cited by five other publications
Feltgen, Quentin
2025. Testing diachronic measures of productivity using the Zipf-Mandelbrot law. In Mathematical Modelling in Linguistics and Text Analysis [Current Issues in Linguistic Theory, 370], ► pp. 6 ff.
Fioravanti, Irene
Lotz, Tom, Wenjun Chen, Shoubao Su & Peter Chifflard
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
