Article published In: International Journal of Corpus Linguistics
Vol. 25:1 (2020) ► pp.89–115
Lexical dispersion and corpus design
Published online: 16 April 2020
https://doi.org/10.1075/ijcl.18010.egb
https://doi.org/10.1075/ijcl.18010.egb
Abstract
Lexical dispersion is typically measured across arbitrary corpus parts of equal size. In this study, we apply
DA – a new dispersion index designed for unequal-sized corpus parts – to the British National
Corpus (BNC) in a series of cases studies to show that the dispersion of a word is strongly influenced by the corpus units or
parts it is measured across. Our results show that dispersion should be measured and interpreted based on corpus units that are
linguistically meaningful for a particular research goal. We conclude with recommendations to help researchers select meaningful
corpus units for measuring and interpreting lexical dispersion.
Keywords: word frequency lists, corpus design, text, mode, DA
Article outline
- 1.Introduction
- 2.Measuring dispersion
- 2.1Arbitrary corpus parts
- 2.2Linguistically meaningful corpus parts
- 3.Indices of dispersion
- 4.Case studies – Dispersion in the BNC
- 4.1Methods
- 4.1.1Corpus parts
- 4.1.2Words used in the analysis
- 4.2Dispersion across arbitrary, equal-sized corpus parts
- 4.3Dispersion across linguistically meaningful corpus parts
- 4.1Methods
- 5.Discussion and conclusion
- Notes
References
References (32)
Alcaraz-Marmol, G. (2015). Dispersion and frequency: Is there any difference as regards their relation to L2 vocabulary gains? International Journal of English Studies, 15(2), 1–16.
Altmann, E. G., Pierrehumbert, J. B., & Motter, A. E. (2009). Beyond word frequency: Bursts, lulls, and scaling in the temporal distributions of words. PLOS one, 4(11), e7678.
Biber, D., Reppen, R., Schnur, E., & Ghanem, R. (2016). On the (non) utility of Juilland’s D to measure lexical dispersion in large corpora. International Journal of Corpus Linguistics, 21(4), 439–464.
Brezina, V., & Gablasova, D. (2013). Is there a core general vocabulary? Introducing the New General Service List. Applied Linguistics, 36(1), 1–22.
Browne, C. (2014). A New General Service List: The better mousetrap we’ve been looking for. Vocabulary Learning and Instruction, 3(2), 1–10.
Burch, B., Egbert, J., & Biber, D. (2017). Measuring and interpreting lexical dispersion in corpus linguistics. Journal of Research Design and Statistics in Linguistics and Communication Science, 3(2), 189–216.
Carroll, J. B. (1970). An alternative to Juilland’s Usage Coefficient for lexical frequencies. ETS Research Report Series, 1970(2), 1–15.
Carroll, J. B., Davies, P., & Richman, B. (1971). The American Heritage Word Frequency Book. Boston, MA: Houghton Mifflin.
Coxhead, A., & Hirsch, D. (2007). A pilot science-specific word list. Revue Française de Linguistique Appliquée, 12(2), 65–78.
Dang, T. N. Y., Coxhead, A., & Webb, S. (2017). The academic spoken word list. Language Learning, 67(4), 959–997.
Davies, M., & Gardner, D. (2010). A Frequency Dictionary of Contemporary American English: Word Sketches, Collocates and Thematic Lists. London: Routledge.
Francis, W. N., & Kucera, H. (1982). Frequency Analysis of English Usage: Lexicon and Grammar. Boston, MA: Houghton Mifflin.
Gardner, D., & Davies, M. (2013). A new academic vocabulary list. Applied Linguistics, 35(3), 305–327.
Gries, S. Th. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403–437.
(2010). Dispersions and adjusted frequencies in corpora: Further explorations. In S. Th. Gries, S. Wulff, & M. Davies (Eds.), Corpus Linguistic Applications: Current Studies, New Directions (pp. 197–212). Amsterdam: Rodopi.
Juilland, A. G., Brodin, D. R., & Davidovitch, C. (1970). Frequency Dictionary of French Words. The Hague: Mouton de Gruyter.
Kilgarriff, A. (1996, June). Why chi-square doesn’t work, and an improved LOB-Brown comparison. Paper presented at the ALLCACH Conference, Bergen, Norway.
Leech, G., Rayson, P., & Wilson, A. (2001). Word Frequencies in Written and Spoken English: Based on the British National Corpus. London: Routledge.
Lei, L., & Liu, D. (2016). A new medical academic word list: A corpus-based study with enhanced methodology. Journal of English for Academic Purposes, 221, 42–53.
Lijffijt, J., & Gries, S. Th. (2012). Correction to Stefan Th. Gries’ “Dispersions and adjusted frequencies in corpora”. International Journal of Corpus Linguistics. 17(1), 147–149.
Lijffijt, J., Nevalainen, T., Säily, T., Papapetrou, P., Puolamäki, K., & Mannila, H. (2016). Significance testing of word frequencies in corpora. Digital Scholarship in the Humanities, 31(2), 374–397.
Liu, D. (2011). The most frequently used English phrasal verbs in American and British English: A multicorpus examination. TESOL Quarterly, 45(4), 661–688.
Matsushita, T. (2012). In what Order Should Learners Learn Japanese Vocabulary? A Corpus-based Approach (Unpublished doctoral dissertation). Victoria University of Wellington, Wellington, New Zealand.
Nation, I. S. P. (2004). A study of the most frequent word families in the British National Corpus. In P. Bogaards & B. Laufer (Eds.), Vocabulary in A Second language: Selection, Acquisition, and Testing (pp. 3–13). Amsterdam/Philadelphia, PA: John Benjamins.
Paquot, M. (2007). Towards a productively-oriented academic word list. In Walinski, J., Kredens, K., & Gozdz-Roszkowski, S. (Eds.), Practical Applications in Language and Computers 2005 (pp. 127–140). Frankfurt am Main: Peter Lang.
Rosengren, I. (1972). Ein Frequenzwörterbuch der deutschen Zeitungssprache: Die Welt, Süddeutsche Zeitung [A Frequency Dictionary of German Newspaper Language: Die Welt, Süddeutsche Zeitung], Vol. 21. Lund: GWK Gleerup.
Savický, P., & Hlavácová, J. (2002). Measures of word commonness. Journal of Quantitative Linguistics, 9(3), 215–231.
Wang, J., Liang, S. L., & Ge, G. C. (2008). Establishment of a medical academic word list. English for Specific Purposes, 27(4), 442–458.
Ward, J. (2009). A basic engineering English word list for less proficient foundation engineering undergraduates. English for Specific Purposes, 28(3), 170–182.
Cited by (22)
Cited by 22 other publications
Flanagan, Joseph
2025. Reproducibility, replicability, robustness, and generalizability in corpus linguistics. International Journal of Corpus Linguistics 30:2 ► pp. 130 ff.
Holmberg Sjöling, Christian
2025. The effect of lexical complexity on grading of Swedish EFL learners’ texts during high-stakes
exams. International Journal of Learner Corpus Research 11:2 ► pp. 245 ff.
Larsson, Tove, Taehyeong Kim & Jesse Egbert
Nelson, Robert N.
Reynolds, Barry Lee
Rojo, Guillermo
Shadrova, Anna
Su, Ruili & Yanfei Zhang
Wojcik, Erica H. & Sarah J. Goulding
Hanks, Elizabeth, Brett Hashimoto & Jesse Egbert
Lestari, Febriana
Sönning, Lukas
Sönning, Lukas
Zou, Xiaoqiong, Hua Liao & Xin Deng
Ekeland Paulsen, Mikkel
Ferrari, Lúcia de Almeida & Evandro Landulfo Teixeira Paradela Cunha
Moiseiuk, Iuliia
Omidian, Taha & Anna Siyanova-Chanturia
Winter, Bodo & Martine Grice
Burch, Brent & Jesse Egbert
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
