Article published In: International Journal of Corpus Linguistics
Vol. 25:2 (2020) ► pp.125–155
Key words when text forms the unit of study
Sizing up the effects of different measures
Available under the Creative Commons Attribution-NonCommercial (CC BY-NC) 4.0 license.
For any use beyond this license, please contact the publisher at rights@benjamins.nl.
Published online: 28 August 2020
https://doi.org/10.1075/ijcl.18053.jea
https://doi.org/10.1075/ijcl.18053.jea
Abstract
Throughout the social sciences, there has been growing pressure to present effect sizes when publishing empirical data (see American Psychological Association. (2001). Publication Manual of the American Psychological Association (5th ed.). American Psychological Association.; Parsons, T. D., & Nelson, N. W. (2004). Paradigm shift in social science research: A significance testing and effect size estimation rapprochement? PsycCRITIQUES, 491(Suppl 3).). While it seems indisputable that for the majority of quantitative research foci, effect size is an essential element of statistical analysis, this paper argues that specifically for key word analysis in corpus linguistics, the means of reporting effect size must depend on the level of the unit of study of each investigation (single text, collection or large corpus). After exploring some main criticisms of the log-likelihood measure, this paper unpacks the parameters of different measures for keyness and how they might address underlying concerns. It maintains that for the exploration of foregrounded/deviant/salient/marked features in text, the use of log-likelihood scores to rank the results is still fit for purpose and coupled with Bayes Factors is a solid approach for key word analyses.
Keywords: keyness, effect size, key word analysis, log-likelihood, ranking
Article outline
- 1.Introduction
- 2.Analysis
- 2.1Defining keyness
- 2.2Two measures of keyness: LL and %DIFF
- 2.3Determining appropriate measures for keyness
- 2.4Parameters used in different measures
- 2.5Rank frequency distributions of Candidate KWs
- 3.Implications
- 4.Conclusion
- Acknowledgements
- Notes
References
References (48)
Anthony, L. (2019). AntConc (Version 3.5.8) [Computer software]. Waseda University. [URL]
American Psychological Association. (2001). Publication Manual of the American Psychological Association (5th ed.). American Psychological Association.
Baker, P. (2004). Querying keywords: Questions of difference, frequency, and sense in keywords analysis. Journal of English Linguistics, 32(4), 346–359.
Baker, P., Gabrielatos, C., Khosravinik, M., Krzyżanowski, M., McEnery, T., & Wodak, R. (2008). A useful methodological synergy? Combining critical discourse analysis and corpus linguistics to examine discourses of refugees and asylum seekers in the UK press. Discourse & Society, 19(3), 273–306.
Brezina, V., McEnery, T., & Wattam, S. (2015). Collocations in context: A new perspective on collocation networks. International Journal of Corpus Linguistics, 20(2), 139–173.
Cobb, T. (2000). The Compleat Lexical Tutor (Version 8.3) [Computer software]. Retrieved November, 2019, from [URL]
Croft, W. B., Metzler, D., & Strohman, T. (2010). Search Engines: Information Retrieval in Practice. Addison-Wesley.
Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61–74.
Egbert, J., & Biber, D. (2019). Incorporating text dispersion into keyword analyses. Corpora, 14 (1), 77–104.
Gabrielatos, C. (2018). Keyness analysis: Nature, metrics and techniques. In C. Taylor & A. Marchi (Eds.) Corpus Approaches to Discourse: A Critical Review. Routledge.
Gabrielatos, C., & Marchi, A. (2012). Keyness: Appropriate metrics and practical issues [Paper presentation]. CADS International Conference 2012, University of Bologna, Italy. [URL]
Gabrielatos, C., Torgersen, E. N., Hoffmann, S., & Fox, S. (2010). A corpus-based sociolinguistic study of indefinite article forms in London English. Journal of English Linguistics, 38(4), 297–334.
Grissom, R. J., & Kim, J. J. (2012). Effect Sizes for Research: Univariate and Multivariate Applications. Routledge.
Hardie, A. (2012). CQPweb: Combining power, flexibility and usability in a corpus analysis tool. International Journal of Corpus Linguistics, 17(3), 380–409.
(2014a). Log Ratio – an informal introduction. ESRC Centre for Corpus Approaches to Social Science (CASS). [URL]
(2014b). Statistical identification of keywords, lockwords and collocations as a two-step procedure [Paper presentation]. ICAME 35 Conference, University of Nottingham, Nottingham, UK.
Jeaco, S. (2017). Concordancing lexical primings: The rationale and design of a user-friendly corpus tool for English language teaching and self-tutoring based on the Lexical Priming theory of language. In M. Pace-Sigge & K. J. Patterson (Eds.), Lexical Priming: Applications and Advances. John Benjamins.
Johnston, J. E., Berry, K. J., & Mielke Jr, P. W. (2006). Measures of effect size for chi-squared and likelihood-ratio goodness-of-fit tests. Perceptual and Motor Skills, 103(2), 412–414.
Kass, R. E., & Raftery, A. E. (1995). Bayes Factors. Journal of the American Statistical Association, 90(430), 773.
Kilgarriff, A., Rychly, P., Smrz, P., & Tugwell, D. (2004). The Sketch Engine [Paper presentation]. The 2003 International Conference on Natural Language Processing and Knowledge Engineering, Beijing, China.
Lee, D. Y. W. (2001). Genres, registers, text types, domains, and styles: Clarifying the concepts and navigating a path through the BNC jungle. Language Learning and Technology, 5(3), 37–72.
Leech, G. N., Hundt, M., Mair, C., & Smith, N. (2009). Change in Contemporary English: A Grammatical Study. Cambridge Univerisity Press.
Leech, G. N., & Short, M. H. (2007). Style in Fiction: A Linguistic Introduction to English Fictional Prose (2nd ed.). Pearson Longman. (Original work published 1981)
Lexical Computing Ltd. (2014). Statistics used in the Sketch Engine. [URL]
Mahlberg, M., Stockwell, P., de Joode, J., Smith, C., & O’Donnell, M. B. (2016). CLiC Dickens: Novel uses of concordances for the integration of corpus stylistics and cognitive poetics. Corpora, 11(3), 433–463.
Parsons, T. D., & Nelson, N. W. (2004). Paradigm shift in social science research: A significance testing and effect size estimation rapprochement? PsycCRITIQUES, 491(Suppl 3).
Partington, A. (2010). Modern Diachronic Corpus-Assisted Discourse Studies (MD-CADS) on UK newspapers: An overview of the project. Corpora, 5(2), 83–108.
Plonsky, L., & Oswald, F. L. (2014). How big is “big”? Interpreting effect sizes in L2 research. Language Learning, 64(4), 878–912.
Raftery, A. E. (1986). A note on Bayes Factors for Log-Linear contingency table models with vague prior information. Journal of the Royal Statistical Society. Series B (Methodological), 48(2), 249–250.
Rayson, P. (n.d.). UCREL Log-likelihood and effect size calculator. Retrieved November, 2019, from [URL]
(2008). From key words to key semantic domains. International Journal of Corpus Linguistics, 13(4), 519–549.
Rayson, P., Berridge, D., & Francis, B. (2004). Extending the Cochran rule for the comparison of word frequencies between corpora [Paper presentation]. The 7th International Conference on Statistical Analysis of Textual Data, Louvain-la-Neuve, Belgium. [URL]
Rayson, P., & Garside, R. (2000). Comparing corpora using frequency profiling [Paper presentation]. The Workshop on Comparing Corpora, Hong Kong University of Science and Technology, Hong Kong. [URL]
Rayson, P., Leech, G., & Hodges, M. (1997). Social differentiation in the use of English vocabulary: Some analyses of the conversational component of the British National Corpus. International Journal of Corpus Linguistics, 2(1), 133–152.
Read, T. R. C., & Cressie, N. A. C. (1988). Goodness-of-fit Statistics for Discrete Multivariate Data. Springer.
(2019a). WordSmith Tools online manual “KeyWords: Calculation”. Retrieved November, 2019, from [URL]
(2019b). WordSmith Tools online manual “KeyWords”. Retrieved November, 2019, from [URL]
(2019c). WordSmith Tools online manual “KeyWords: Thinking about keyness”. Retrieved November, 2019, from [URL]
(2019d). WordSmith Tools online manual “KeyWords: Keyness definition”. Retrieved November, 2019, from [URL]
Scott, M., & Tribble, C. (2006). Textual Patterns: Key Words and Corpus Analysis in Language Education. John Benjamins.
Cited by (8)
Cited by eight other publications
Pojanapunya, Punjaporn
Ballance, Oliver J. & Averil Coxhead
Malory, Beth
Malory, Beth
Jeaco, Stephen
Jeaco, Stephen
2023. How can we communicate (visually) what we (usually) mean by collocation and keyness?. Journal of Second Language Studies 6:1 ► pp. 29 ff.
[no author supplied]
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
