Article published In: International Journal of Corpus Linguistics
Vol. 21:4 (2016) ► pp.439–464
On the (non)utility of Juilland’s D to measure lexical dispersion in large corpora
Published online: 6 December 2016
https://doi.org/10.1075/ijcl.21.4.01bib
https://doi.org/10.1075/ijcl.21.4.01bib
This paper explores the effectiveness of Juilland’s D as a measure of vocabulary dispersion in large corpora. Through a series of experiments using the BNC, we explored the influence of three variables: the number of corpus-parts used for the computation of D, the frequency of the target word, and the distributions of those words. The experiments demonstrate that the effective range for D is greatly reduced when computations are based on a large number of corpus-parts: even words with highly skewed distributions have D values indicating a relatively uniform distribution. We also briefly explore an alternative measure, Gries’ DP (Gries 2008), showing that it is a more reliable and effective measure of dispersion in a large corpus divided into many parts. In conclusion, we discuss the implications of these findings for quantitative methods applied to the creation of vocabulary lists as well as research questions in other areas of corpus linguistics.
References (15)
Baker, P., & Egbert, J. (Eds.) (2016). Triangulating Methodological Approaches in Corpus-linguistic Research. New York, NY: Routledge.
Biber, D. (2012). Register as a predictor of linguistic variation. Corpus Linguistics and Linguistic Theory, 8(1), 9–37.
Biber, D., Egbert, J., Gray, B., Oppliger, R., & Szmrecsanyi, B. (Forthcoming). Variationist versus text-linguistic approaches to grammatical change in English: Nominal modifiers of head nouns. In M. Kytö & P. Pahta (Eds.), Cambridge Handbook of English Historical Linguistics. Cambridge: Cambridge University Press.
Brezina, V., & Gablasova, D. (2015). Is there a core general vocabulary? Introducing the New General Service List. Applied Linguistics, 36(1), 1–22.
Davies, M., & Gardner, D. (2010). A Frequency Dictionary of Contemporary American English: Word Sketches, Collocates, and Thematic Lists. London: Routledge.
Evert, S. (2004). The statistics of word co-occurrences: Word pairs and collocations (Unpublished doctoral dissertation). University of Stuttgart, Germany. Retrieved from [URL] (last accessed September 2016).
Gries, S. Th. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403–437.
Juilland, A.G., Brodin, D.R., & Davidovitch, C. (1970). Frequency Dictionary of French Words. The Hague: Mouton de Gruyter
Juilland, A., & Chang-Rodriguez, E. (1964). Frequency Dictionary of Spanish words. The Hague: Mouton de Gruyter.
Leech, G., Rayson, P., & Wilson, A. (2001). Word Frequencies in Written and Spoken English: Based on the British National Corpus. London: Longman.
Cited by (46)
Cited by 46 other publications
Gong, Tongxi & Lei Liu
Basile, Rodolfo
Cox, Ashleigh, Daniel Dixon & Tülay Dixon
Gong, Tongxi, Lei Liu, Jianjun Shi & Yi Guo
Platt, William C. X.
2025. Review of Gries (2024): Frequency, Dispersion, Association, and Keyness: Revising and tupleizing corpus-linguistic measures. International Journal of Corpus Linguistics 30:3 ► pp. 417 ff.
Qian, Yubin & Nan Wu
Reynolds, Barry Lee
Rojo, Guillermo
Sönning, Lukas
Wang, Ying
Wang, Ying, Henrik Kaatari, Tove Larsson, Hongping Xiong & Fei Liu
Zhan, Hongwei
Zou, Yue & Hao Lin
Ekeland Paulsen, Mikkel
Nelson, Robert N.
Nelson, Robert N.
Posch, Claudia
Ferrari, Lúcia de Almeida & Evandro Landulfo Teixeira Paradela Cunha
Gries, Stefan Th.
Gries, Stefan Th.
Th Gries, Stefan
Th. Gries, Stefan
Kamrotov, Mikhail, Ekaterina Talalakina & Denis Stukal
McGrath, Darby & Cassi Liardét
Qian, Yubin
Serigos, Jacqueline
Omidian, Taha & Anna Siyanova-Chanturia
Egbert, Jesse, Brent Burch & Douglas Biber
2020. Lexical dispersion and corpus design. International Journal of Corpus Linguistics 25:1 ► pp. 89 ff.
Egbert, Jesse & Luke Plonsky
Miller, Don
Miller, Don
Csomay, Eniko & Alexandra Prades
Dang, Thi Ngoc Yen
Dang, Thi Ngoc Yen
2018. A Hard Science Spoken Word List. ITL - International Journal of Applied Linguistics 169:1 ► pp. 44 ff.
Dang, Thi Ngoc Yen
2020. A Hard Science Spoken Word List. In Approaches to Learning, Testing and Researching L2 Vocabulary [Benjamins Current Topics, 109], ► pp. 45 ff.
Dunn, Jonathan
2018. Multi-unit association measures. International Journal of Corpus Linguistics 23:2 ► pp. 183 ff.
Jakobsen, Anne Sofie, Averil Coxhead & Birgit Henriksen
Dang, Thi Ngoc Yen, Averil Coxhead & Stuart Webb
[no author supplied]
[no author supplied]
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
