Article published In: International Journal of Corpus Linguistics
Vol. 26:3 (2021) ► pp.305–335
Problematising characteristicness
A biomedical association case study
Published online: 23 July 2021
https://doi.org/10.1075/ijcl.19019.pre
https://doi.org/10.1075/ijcl.19019.pre
Abstract
Keyness is a commonly used method in corpus linguistics and is assumed to identify key items that are characteristic of 1 corpus when compared to another. This paper puts this assumption to the test by comparing case study corpora in the fields of genetic, immunological and psychiatric biomedical association studies, using what we refer to as a ‘K-FLUX’ analysis to produce a set of key items. Experts from within these fields are asked to evaluate the extent to which identified key items are characteristic of their discipline. The paper concludes that less than 50% of the items identified by the method are rated as highly characteristic by experts and that this ranges between types of association study. Further, there is difficulty in reaching a consensus over what is deemed to be ‘characteristic’, thus posing a challenge to the ultimate aim of the keyness method. The paper demonstrates the value of supporting corpus linguistic studies with expert assessments to evaluate whether (and which) items can be said to be indicative of a particular field.
Keywords: key items, keyness, characteristic, evaluation, biomedical
Article outline
- 1.Introduction
- 2.Using keyness to determine characteristicness
- 3.Data
- 4.Words, lemmas and word families
- 5.Generating key items for evaluation
- 5.1Procedure
- 5.2Results
- 6.Evaluation studies
- 6.1Study 1: Pilot study
- 6.1.1Procedure
- 6.1.2Results and discussion
- 6.2Study 2: Wider evaluative study
- 6.2.1Procedure
- 6.2.2Results
- 6.2.3Discussion
- 6.1Study 1: Pilot study
- 7.General discussion and conclusion
- Notes
References
References (33)
Anthony, L. (2018). AntConc (Version 3.5.7) [Computer software]. Waseda University. [URL]
Bauer, L., & Nation, P. (1993). Word families. International Journal of Lexicography, 6(4), 253–279.
Bondi, M. (2010). Perspectives on keywords and keyness: An introduction. In M. Bondi & M. Scott (Eds.), Keyness in Texts (pp. 1–20). John Benjamins.
Cheng, W. (2007). Concgramming: A corpus-driven approach to learning the phraseology of discipline-specific texts. CORELL: Computer Resources for Language Learning, 11, 22–35.
(2009). Income/interest/net: Using internal criteria to determine the aboutness of a text. In K. Aijmer (Ed.), Corpora and Language Teaching (pp. 157–177). John Benjamins.
Conway, M. (2010). Mining a corpus of biographical texts using keywords. Literary and Linguistic Computing, 25(1), 23–35.
El-Haj, M., Rayson, P., Piao, S., & Knight, J. (2018). Profiling medical journal articles using a gene ontology semantic tagger. In N. Calzolari et al. (Eds.), Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (pp. 4593–4597). European Language Resources Association (ELRA). [URL]
Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382.
Gabrielatos, C. (2018). Keyness analysis: Nature, metrics and techniques. In C. Taylor & A. Marchi (Eds.), Corpus Approaches to Discourse: A Critical Review (pp. 225–258). Routledge.
Gabrielatos, C., & Marchi, A. (2012, September 13–14). Keyness: Appropriate metrics and practical issues [Paper presentation]. Corpus-Assisted Discourse Studies International Conference, Bologna, Italy. [URL]
Hamilton, C., Adolphs, S., & Nerlich, B. (2007). The meanings of ‘risk’: A view from corpus linguistics. Discourse & Society, 18(2), 163–181.
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773–795.
Kehoe, A., & Gee, M. (2011). Social Tagging: A new perspective on textual “aboutness”. Studies in Variation, Contacts and Change in English, 6(5). [URL]
Landis, J. R. & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174.
NCBI. (2018). PubMed. National Center for Biotechnology Information, U.S. National Library of Medicine. Bethesda MD, USA. [URL]
Nenkova, A., & McKeown, K. (2012). A survey of text summarization techniques. In C. Aggarwal & C. Zhai (Eds.), Mining Text Data (pp. 43–76). Springer.
Phillips, M. (1989). Lexical Structure of Text. Discourse Analysis Monographs: 12. English Language Research, University of Birmingham.
Plappert, G. (2017). Candidate knowledge? Exploring epistemic claims in scientific writing: A corpus-driven approach. Corpora, 12(3), 425–457.
Pojanapunya, P., & Todd, R. W. (2018). Log-likelihood and odds ratio: Keyness statistics for different purposes of keyword analysis. Corpus Linguistics and Linguistic Theory, 14(1), 133–167.
Rayson, P. (2008). From keywords to key semantic domains. International Journal of Corpus Linguistics, 13(4), 519–549.
(2016). Log-likelihood and effect size calculator [Excel spreadsheet]. [URL]
Saber, A. (2012). Phraseological patterns in a large corpus of biomedical articles. In A. Boulton, S. Carter-Thomas, & E. Rowley-Jolivet (Eds.), Corpus-informed Research and Learning in ESP: Issues and Applications (pp. 45–82). John Benjamins.
(2001). Comparing corpora and identifying key words, collocations, and frequency distributions through the WordSmith Tools suite of computer programs. In M. Ghadessy, A. Henry, & R. L. Roseberry (Eds.), Small Corpus Studies and ELT (pp. 47–67). John Benjamins.
(2010). Problems in investigating keywords, or clearing the undergrowth and marking out trails… In Bondi, M. & Scott, M. (Eds.), Keyness in Texts (pp. 43–58). John Benjamins.
(2015). WordSmith Tools Manual: Consistency analysis. [URL]
(2019). WordSmith Tools (Version 7) [Computer software]. Lexical Analysis Software. [URL]
Scott, M., & Tribble, C. (2006). Textual Patterns: Key Words and Corpus Analysis in Language Education. John Benjamins.
Taylor, C. (2013). Searching for similarity using corpus-assisted discourse studies. Corpora, 8(1), 81–113.
(2018). Similarity. In C. Taylor, C. & A. Marchi (Eds.), Corpus Approaches to Discourse: A Critical Review (pp. 19–37). Routledge.
Williams, I. A. (2012). Self-reference in biomedical research article discussions: Further evidence for cross-cultural diversity in academic and scientific discourse. International Journal of Corpus Linguistics, 17(4), 546–583.
Cited by (4)
Cited by four other publications
Fioravanti, Irene
López-Rodríguez, Clara Inés
Prentice, Sheryl, Paul Rayson, Jo Knight, Mahmoud El-Haj & Solly Elstein
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
