Article published In: Natural language processing for learner corpus research
Edited by Kristopher Kyle
[International Journal of Learner Corpus Research 7:1] 2021
► pp. 163–194
How operationalizations of word types affect measures of lexical diversity
Published online: 1 March 2021
https://doi.org/10.1075/ijlcr.20004.jar
https://doi.org/10.1075/ijlcr.20004.jar
Abstract
This study tests three measures of lexical diversity (LD), each using five operationalizations of word types. The measures
include MTLD (measure of textual lexical diversity), MTLD-W (moving average MTLD with wrap-around measurement), and MATTR (moving average
type-token ratio). Each of these measures is tested with types operationalized as orthographic forms, lemmas using automated POS tags,
lemmas using manually corrected POS tags, flemmas (list-based lemmas that do not distinguish between parts of speech), and word families.
These measures are applied to 60 narrative texts written in English by adolescent native speakers of English (n = 13),
Finnish (n = 31), and Swedish (n = 16). Each individual LD measure is evaluated in relation to how well it
correlates with the mean LD ratings of 55 human raters whose inter-rater reliability was exceedingly high (Cronbach’s alpha = .980). The
overall results show that the three measures are comparable but two of the operationalizations of types produce mixed results across
measures.
Keywords: lexical diversity, word types, MTLD, MTLD-W, MATTR
Article outline
- 1.Introduction
- 2.Challenges for the assessment of LD
- 2.1Text length
- 2.2Operationalization of types
- 2.3Human ratings of lexical diversity
- 3.Research questions
- 4.Method
- 4.1Corpus
- 4.2Human ratings of LD
- 4.3Automated measures of LD
- 4.4Statistical analysis
- 5.Results
- 5.1Tagger accuracy
- 5.2Pearson correlations
- 5.3Correlation comparisons by automated measure
- 5.4Correlation comparisons by operationalization of word types
- 5.5Linear regressions
- 5.6Identifying outlier texts
- 6.Discussion and conclusions
- 6.1Revisiting the research questions
- 6.2Tagger accuracy
- 6.3Construct validity
- 6.4Conclusions
- Notes
References
References (58)
Aitchison, J. (2012). Words in the mind: An introduction to the mental lexicon. Chichester, UK: Wiley.
Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford, UK: Oxford University Press.
Bauer, L., & Nation, P. (1993). Word families. International Journal of Lexicography, 6(4), 253–279.
Bruce, P., & Bruce, A. (2017). Practical statistics for data scientists. Newton, MA, USA: O’Reilly Media.
Carroll, J. B. (1938). Diversity of vocabulary and the harmonic series law of word-frequency distribution. The Psychological Record, 2(16), 379–386.
Cook, R. D. (1977). Detection of influential observation in linear regression. Technometrics, 19(1), 15–18.
(1979). Influential observations in linear regression. Journal of the American Statistical Association, 74(365), 169–174.
Covington, M. A., & McFall, J. D. (2010). Cutting the Gordian knot: The moving-average type–token ratio (MATTR). Journal of Quantitative Linguistics, 17(2), 94–100.
Diedenhofen, B., & Musch, J. (2015). cocor: A comprehensive solution for the statistical comparison of correlations. PLoS ONE, 101, e0121945.
Dugast, D. (1978). Sur quoi se fonde la notion d’etendue theoretique du vocabulaire? Le Francais Moderne, 461, 25–32.
Durán, P., Malvern, D., Richards, J. B., & Chipere, N. (2004). Developmental trends in lexical diversity. Applied Linguistics, 25(2), 220–242.
Fergadiotis, G. (2011). Modeling lexical diversity across language sampling and estimation techniques (Unpublished doctoral dissertation). Retrieved from Proquest Dissertations and Theses. (UMI No. 3486935)
Fergadiotis, G., Wright, H. H., & West, T. M. (2013). Measuring lexical diversity in narrative discourse of people with aphasia. American Journal of Speech-Language Pathology, 221, S397–S408.
Fergadiotis, G., Wright, H. H., & Green, S. B. (2015). Psychometric evaluation of lexical diversity indices: Assessing length effects. Journal of Speech, Language, and Hearing Research, 58(3), 840–852.
Hess, C. W., Sefton, K. M. & Landry, R. G. (1986). Sample size and type-token ratios for oral language of preschool children. Journal of Speech and Hearing Research, 291, 129–134.
Horsmann, T., Erbs, N., & Zesch, T. (2015). Fast or accurate? A comparative evaluation of PoS tagging models. In Proceedings of the International Conference of the German Society for Computational Linguistics and Language Technology (pp. 22–30).
Jarvis, S. (2000). Methodological rigor in the study of transfer: Identifying L1 influence in the interlanguage lexicon. Language Learning, 50(2), 245–309.
(2002). Short texts, best-fitting curves and new measures of lexical diversity. Language Testing, 19(1), 57–84.
(2013b). Defining and measuring lexical diversity. In S. Jarvis & M. Daller (Eds.), Vocabulary knowledge: Human ratings and automated measures (pp. 13–44). Amsterdam: John Benjamins Publishing.
Johnson, W. (1939). Language and speech hygiene: An application of general semantics. Ann Arbor, MI: Edwards Brothers.
(1944). Studies in language behavior: I. A program of research. Psychological Monographs, 561, 1–15.
Johnson, D. R., & Creech, J. C. (1983). Ordinal measures in multiple indicator models: A simulation study of categorization error. American Sociological Review, 481, 398–407.
Koizumi, R., & In’nami, Y. (2012). Effects of text length on lexical diversity measures: Using short texts with less than 200 tokens. System, 401, 554–564.
Kremmel, B. (2016). Word families and frequency bands in vocabulary tests: Challenging conventions. TESOL Quarterly, 50(4), 976–987.
Kremmel, B., & Schmitt, N. (2016). Interpreting vocabulary test scores: What do various item formats tell us about learners’ ability to employ words? Language Assessment Quarterly, 13(4), 377–392.
Maas, H. D. (1972). Zusammenhang zwischen Wortschatzumfang und Länge eines Textes. Zeitschrift für Literaturwissenschaft und Linguistik, 81, 73–79.
Malvern, D., Richards, B., Chipere, N., & Durán, P. (2004). Lexical diversity and language development: Quantification and assessment. New York: Palgrave MacMillan.
McCarthy, P. M. (2005). An assessment of the range and usefulness of lexical diversity measures and the potential of the measure of textual, lexical diversity (MTLD) [Microfiche]. (Unpublished Doctoral dissertation). University of Memphis, Tennessee.
McCarthy, P. M., & Jarvis, S. (2007). vocd: A theoretical and empirical evaluation. Language Testing, 24(4), 459–488.
(2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42(2), 381–392.
McLean, S. (2018). Evidence for the adoption of the flemma as an appropriate word counting unit. Applied Linguistics, 39(6), 823–845.
McDonald, B. (2002). A teaching note on Cook’s distance – a guideline. Research Letters in the Information and Mathematical Sciences, 31, 122–128.
McKee, G., Malvern, D., & Richards, B. (2000). Measuring vocabulary diversity using dedicated software. Literacy and Linguistic Computing, 151, 323–337.
Meurers, D. (2015). Learner corpora and Natural Language Processing. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research (pp. 537–566). Cambridge: Cambridge University Press.
Norman, G. (2010). Likert scales, levels of measurement and the “laws” of statistics. Advances in Health Sciences Education, 15(5), 625–632.
Owen, A. J., & Leonard, L. B. (2002). Lexical diversity in the spontaneous speech of children with specific language impairment: Application of D. Journal of Speech and Hearing Research, 451, 927–937.
Pinchbeck, G. G. (2014). Lexical frequencies profiling of Canadian High School Diploma Exam Expository Writing: L1 and L2 academic English. Roundtable presentation at American Association of Applied Linguistics, Toronto, Ontario.
R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Retrieved from [URL]
Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. Paper presented at the International Conference on New Methods in Language Processing, Manchester, UK.
Schmitt, N., & Zimmerman, C. B. (2002). Derivative word forms: What do learners know? TESOL Quarterly, 36(2), 145–171.
Steiger, J. H. (1980). Tests for comparing elements of a correlation matrix. Psychological Bulletin, 87(2), 245–251.
Sullivan, G. & Artino Jr., A. R. (2013). Analyzing and interpreting data from Likert-type scales. Journal of Graduate Medical Education. 5(4), 541–542.
Treffers-Daller, J. (2013). Measuring lexical diversity among L2 learners of French: An exploration of the validity of D, MTLD, and HD-D as measures of language ability. In S. Jarvis & M. Daller (Eds.), Vocabulary knowledge: Human ratings and automated measures (pp. 79–103). Amsterdam: John Benjamins Publishing.
Treffers-Daller, J., Parslow, P., & Williams, S. (2018). Back to basics: How measures of lexical diversity can help discriminate between CEFR levels. Applied Linguistics, 39(3), 302–327.
Vidal, K., & Jarvis, S. (2020). Effects of English-medium instruction on Spanish students’ proficiency and lexical diversity in English. Language Teaching Research, 24(5), 568–587.
Vögelin, C., Jansen, T., Keller, S. D., Machts, N., & Möller, J. (2019). The influence of lexical features on teacher judgements of ESL argumentative essays. Assessing Writing, 391, 50–63.
Ward, J., & Chuenjundaeng, J. (2009). Suffix knowledge: Acquisition and applications. System, 371, 461–469.
Yu, G. (2010). Lexical diversity in writing and speaking task performances. Applied Linguistics, 311, 236–259.
Yule, G. U. (1944). The statistical study of literary vocabulary. Cambridge, UK: Cambridge University Press.
Cited by (10)
Cited by ten other publications
Bulté, Bram, Alex Housen & Gabriele Pallotti
Liu, Jinlu, Ming Wu & Haitao Liu
Bestgen, Yves
Bestgen, Yves
Díez-Ortega, María & Kristopher Kyle
Kyle, Kristopher & Masaki Eguchi
Sung, Hakyung, Sooyeon Cho & Kristopher Kyle
Akbary, Mary & Scott Jarvis
Vandeweerd, Nathan, Alex Housen & Magali Paquot
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
