How operationalizations of word types affect measures of lexical diversity

Jarvis, Scott; Hashimoto, Brett James

doi:10.1075/ijlcr.20004.jar

Article published In: Natural language processing for learner corpus research
Edited by Kristopher Kyle
[International Journal of Learner Corpus Research 7:1] 2021
► pp. 163–194

Get fulltext from our e-platform

Download PDF

How operationalizations of word types affect measures of lexical diversity

Scott Jarvis | University of Utah

Brett James Hashimoto | Brigham Young University

Published online: 1 March 2021

https://doi.org/10.1075/ijlcr.20004.jar

Abstract

This study tests three measures of lexical diversity (LD), each using five operationalizations of word types. The measures include MTLD (measure of textual lexical diversity), MTLD-W (moving average MTLD with wrap-around measurement), and MATTR (moving average type-token ratio). Each of these measures is tested with types operationalized as orthographic forms, lemmas using automated POS tags, lemmas using manually corrected POS tags, flemmas (list-based lemmas that do not distinguish between parts of speech), and word families. These measures are applied to 60 narrative texts written in English by adolescent native speakers of English (n = 13), Finnish (n = 31), and Swedish (n = 16). Each individual LD measure is evaluated in relation to how well it correlates with the mean LD ratings of 55 human raters whose inter-rater reliability was exceedingly high (Cronbach’s alpha = .980). The overall results show that the three measures are comparable but two of the operationalizations of types produce mixed results across measures.

Keywords: lexical diversity, word types, MTLD, MTLD-W, MATTR

Article outline

1.Introduction
2.Challenges for the assessment of LD
- 2.1Text length
- 2.2Operationalization of types
- 2.3Human ratings of lexical diversity
3.Research questions
4.Method
- 4.1Corpus
- 4.2Human ratings of LD
- 4.3Automated measures of LD
- 4.4Statistical analysis
5.Results
- 5.1Tagger accuracy
- 5.2Pearson correlations
- 5.3Correlation comparisons by automated measure
- 5.4Correlation comparisons by operationalization of word types
- 5.5Linear regressions
- 5.6Identifying outlier texts
6.Discussion and conclusions
- 6.1Revisiting the research questions
- 6.2Tagger accuracy
- 6.3Construct validity
- 6.4Conclusions
Notes
References

References (58)

References

Aitchison, J. (2012). Words in the mind: An introduction to the mental lexicon. Chichester, UK: Wiley.

Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford, UK: Oxford University Press.

Bauer, L., & Nation, P. (1993). Word families. International Journal of Lexicography, 6(4), 253–279.

Bruce, P., & Bruce, A. (2017). Practical statistics for data scientists. Newton, MA, USA: O’Reilly Media.

Carroll, J. B. (1938). Diversity of vocabulary and the harmonic series law of word-frequency distribution. The Psychological Record, 2(16), 379–386.

(1964). Language and thought. Englewood Cliffs, NJ: Prentice-Hall.

Cook, R. D. (1977). Detection of influential observation in linear regression. Technometrics, 19(1), 15–18.

(1979). Influential observations in linear regression. Journal of the American Statistical Association, 74(365), 169–174.

Covington, M. A., & McFall, J. D. (2010). Cutting the Gordian knot: The moving-average type–token ratio (MATTR). Journal of Quantitative Linguistics, 17(2), 94–100.

Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 341, 213–238.

Diedenhofen, B., & Musch, J. (2015). cocor: A comprehensive solution for the statistical comparison of correlations. PLoS ONE, 101, e0121945.

Dugast, D. (1978). Sur quoi se fonde la notion d’etendue theoretique du vocabulaire? Le Francais Moderne, 461, 25–32.

Durán, P., Malvern, D., Richards, J. B., & Chipere, N. (2004). Developmental trends in lexical diversity. Applied Linguistics, 25(2), 220–242.

Fergadiotis, G. (2011). Modeling lexical diversity across language sampling and estimation techniques (Unpublished doctoral dissertation). Retrieved from Proquest Dissertations and Theses. (UMI No. 3486935)

Fergadiotis, G., Wright, H. H., & West, T. M. (2013). Measuring lexical diversity in narrative discourse of people with aphasia. American Journal of Speech-Language Pathology, 221, S397–S408.

Fergadiotis, G., Wright, H. H., & Green, S. B. (2015). Psychometric evaluation of lexical diversity indices: Assessing length effects. Journal of Speech, Language, and Hearing Research, 58(3), 840–852.

Guiraud, P. (1960). Problèmes et méthodes de la statistique linguistique. Dordrecht: D. Reidel.

Herdan, G. (1960). Quantitative Linguistics. London: Butterworth.

Hess, C. W., Sefton, K. M. & Landry, R. G. (1986). Sample size and type-token ratios for oral language of preschool children. Journal of Speech and Hearing Research, 291, 129–134.

Horsmann, T., Erbs, N., & Zesch, T. (2015). Fast or accurate? A comparative evaluation of PoS tagging models. In Proceedings of the International Conference of the German Society for Computational Linguistics and Language Technology (pp. 22–30).

Jarvis, S. (2000). Methodological rigor in the study of transfer: Identifying L1 influence in the interlanguage lexicon. Language Learning, 50(2), 245–309.

(2002). Short texts, best-fitting curves and new measures of lexical diversity. Language Testing, 19(1), 57–84.

(2013a). Capturing the diversity in lexical diversity. Language Learning, 63(S1), 87–106.

(2013b). Defining and measuring lexical diversity. In S. Jarvis & M. Daller (Eds.), Vocabulary knowledge: Human ratings and automated measures (pp. 13–44). Amsterdam: John Benjamins Publishing.

(2017). Grounding lexical diversity in human judgments. Language Testing, 34(4), 537–553.

Johnson, W. (1939). Language and speech hygiene: An application of general semantics. Ann Arbor, MI: Edwards Brothers.

(1944). Studies in language behavior: I. A program of research. Psychological Monographs, 561, 1–15.

Johnson, D. R., & Creech, J. C. (1983). Ordinal measures in multiple indicator models: A simulation study of categorization error. American Sociological Review, 481, 398–407.

Koizumi, R., & In’nami, Y. (2012). Effects of text length on lexical diversity measures: Using short texts with less than 200 tokens. System, 401, 554–564.

Kremmel, B. (2016). Word families and frequency bands in vocabulary tests: Challenging conventions. TESOL Quarterly, 50(4), 976–987.

Kremmel, B., & Schmitt, N. (2016). Interpreting vocabulary test scores: What do various item formats tell us about learners’ ability to employ words? Language Assessment Quarterly, 13(4), 377–392.

Maas, H. D. (1972). Zusammenhang zwischen Wortschatzumfang und Länge eines Textes. Zeitschrift für Literaturwissenschaft und Linguistik, 81, 73–79.

Malvern, D., Richards, B., Chipere, N., & Durán, P. (2004). Lexical diversity and language development: Quantification and assessment. New York: Palgrave MacMillan.

McCarthy, P. M. (2005). An assessment of the range and usefulness of lexical diversity measures and the potential of the measure of textual, lexical diversity (MTLD) [Microfiche]. (Unpublished Doctoral dissertation). University of Memphis, Tennessee.

McCarthy, P. M., & Jarvis, S. (2007). vocd: A theoretical and empirical evaluation. Language Testing, 24(4), 459–488.

(2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42(2), 381–392.

McLean, S. (2018). Evidence for the adoption of the flemma as an appropriate word counting unit. Applied Linguistics, 39(6), 823–845.

McDonald, B. (2002). A teaching note on Cook’s distance – a guideline. Research Letters in the Information and Mathematical Sciences, 31, 122–128.

McKee, G., Malvern, D., & Richards, B. (2000). Measuring vocabulary diversity using dedicated software. Literacy and Linguistic Computing, 151, 323–337.

Meurers, D. (2015). Learner corpora and Natural Language Processing. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research (pp. 537–566). Cambridge: Cambridge University Press.

Norman, G. (2010). Likert scales, levels of measurement and the “laws” of statistics. Advances in Health Sciences Education, 15(5), 625–632.

Owen, A. J., & Leonard, L. B. (2002). Lexical diversity in the spontaneous speech of children with specific language impairment: Application of D. Journal of Speech and Hearing Research, 451, 927–937.

Pinchbeck, G. G. (2014). Lexical frequencies profiling of Canadian High School Diploma Exam Expository Writing: L1 and L2 academic English. Roundtable presentation at American Association of Applied Linguistics, Toronto, Ontario.

R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Retrieved from [URL]

Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. Paper presented at the International Conference on New Methods in Language Processing, Manchester, UK.

Schmitt, N., & Zimmerman, C. B. (2002). Derivative word forms: What do learners know? TESOL Quarterly, 36(2), 145–171.

Steiger, J. H. (1980). Tests for comparing elements of a correlation matrix. Psychological Bulletin, 87(2), 245–251.

Sullivan, G. & Artino Jr., A. R. (2013). Analyzing and interpreting data from Likert-type scales. Journal of Graduate Medical Education. 5(4), 541–542.

Treffers-Daller, J. (2013). Measuring lexical diversity among L2 learners of French: An exploration of the validity of D, MTLD, and HD-D as measures of language ability. In S. Jarvis & M. Daller (Eds.), Vocabulary knowledge: Human ratings and automated measures (pp. 79–103). Amsterdam: John Benjamins Publishing.

Treffers-Daller, J., Parslow, P., & Williams, S. (2018). Back to basics: How measures of lexical diversity can help discriminate between CEFR levels. Applied Linguistics, 39(3), 302–327.

Vidal, K., & Jarvis, S. (2020). Effects of English-medium instruction on Spanish students’ proficiency and lexical diversity in English. Language Teaching Research, 24(5), 568–587.

Vögelin, C., Jansen, T., Keller, S. D., Machts, N., & Möller, J. (2019). The influence of lexical features on teacher judgements of ESL argumentative essays. Assessing Writing, 391, 50–63.

Ward, J., & Chuenjundaeng, J. (2009). Suffix knowledge: Acquisition and applications. System, 371, 461–469.

Yu, G. (2010). Lexical diversity in writing and speaking task performances. Applied Linguistics, 311, 236–259.

Yule, G. U. (1944). The statistical study of literary vocabulary. Cambridge, UK: Cambridge University Press.

Zipf, G. K. (1935). The psycho-biology of language. Boston: Houghton Mifflin.

Zou, G. Y. (2007). Toward using confidence intervals to compare correlations. Psychological Methods, 12(4), 399–413.

Zumbo, B. D., & Zimmerman, D. W. (1993). Is the selection of statistical methods governed by level of measurement? Canadian Psychology, 341, 390–400.

Cited by (10)

Cited by ten other publications

Order by:

Bulté, Bram, Alex Housen & Gabriele Pallotti

2025. Complexity and Difficulty in Second Language Acquisition: A Theoretical and Methodological Overview. Language Learning 75:2 ► pp. 533 ff.

Liu, Jinlu, Ming Wu & Haitao Liu

2025. Investigating the dynamic relationship of lexical and syntactic complexity in L2 writing across proficiency levels: A CDST-inspired study. System 131 ► pp. 103678 ff.

Bestgen, Yves

2024. Measuring Lexical Diversity in Texts: The Twofold Length Problem. Language Learning 74:3 ► pp. 638 ff.

Bestgen, Yves

2024. Back to Basics in Measuring Lexical Diversity: Too Simple to Be True. Applied Linguistics 45:5 ► pp. 926 ff.

Díez-Ortega, María & Kristopher Kyle

2024. Measuring the development of lexical richness of L2 Spanish: A longitudinal learner corpus study. Studies in Second Language Acquisition 46:1 ► pp. 169 ff.

Kyle, Kristopher & Masaki Eguchi

2024. Evaluating NLP models with written and spoken L2 samples. Research Methods in Applied Linguistics 3:2 ► pp. 100120 ff.

Sung, Hakyung, Sooyeon Cho & Kristopher Kyle

2024. An Empirical Evaluation of Lexical Diversity Indices in L2 Korean Writing Assessment. Language Assessment Quarterly 21:2 ► pp. 159 ff.

Akbary, Mary & Scott Jarvis

2023. Lexical diversity as a predictor of genre in TV shows. Digital Scholarship in the Humanities 38:3 ► pp. 921 ff.

Vandeweerd, Nathan, Alex Housen & Magali Paquot

2023. Proficiency at the lexis–grammar interface: Comparing oral versus written French exam tasks. Language Testing 40:3 ► pp. 658 ff.

Woods, Kelly, Brett Hashimoto & Earl K. Brown

2023. A multi-measure approach for lexical diversity in writing assessments: Considerations in measurement and timing. Assessing Writing 55 ► pp. 100688 ff.

This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.