Problematising characteristicness: A biomedical association case study

Prentice, Sheryl; Knight, Jo; Rayson, Paul; Haj, Mahmoud El; Rutherford, Nathan

doi:10.1075/ijcl.19019.pre

Article published In: International Journal of Corpus Linguistics
Vol. 26:3 (2021) ► pp.305–335

Get fulltext from our e-platform

Download PDF

Problematising characteristicness

A biomedical association case study

Sheryl Prentice | Lancaster University

Jo Knight | Lancaster University

Paul Rayson | Lancaster University

Mahmoud El Haj | Lancaster University

Nathan Rutherford | Royal Holloway University of London

Published online: 23 July 2021

https://doi.org/10.1075/ijcl.19019.pre

Abstract

Keyness is a commonly used method in corpus linguistics and is assumed to identify key items that are characteristic of 1 corpus when compared to another. This paper puts this assumption to the test by comparing case study corpora in the fields of genetic, immunological and psychiatric biomedical association studies, using what we refer to as a ‘K-FLUX’ analysis to produce a set of key items. Experts from within these fields are asked to evaluate the extent to which identified key items are characteristic of their discipline. The paper concludes that less than 50% of the items identified by the method are rated as highly characteristic by experts and that this ranges between types of association study. Further, there is difficulty in reaching a consensus over what is deemed to be ‘characteristic’, thus posing a challenge to the ultimate aim of the keyness method. The paper demonstrates the value of supporting corpus linguistic studies with expert assessments to evaluate whether (and which) items can be said to be indicative of a particular field.

Keywords: key items, keyness, characteristic, evaluation, biomedical

Article outline

1.Introduction
2.Using keyness to determine characteristicness
3.Data
4.Words, lemmas and word families
5.Generating key items for evaluation
- 5.1Procedure
- 5.2Results
6.Evaluation studies
- 6.1Study 1: Pilot study
  - 6.1.1Procedure
  - 6.1.2Results and discussion
- 6.2Study 2: Wider evaluative study
  - 6.2.1Procedure
  - 6.2.2Results
  - 6.2.3Discussion
7.General discussion and conclusion
Notes
References

References (33)

References

Alderson, C. (2007). Judging the frequency of English words. Applied Linguistics, 28(3), 383–409.

Anthony, L. (2018). AntConc (Version 3.5.7) [Computer software]. Waseda University. [URL]

Bauer, L., & Nation, P. (1993). Word families. International Journal of Lexicography, 6(4), 253–279.

Bondi, M. (2010). Perspectives on keywords and keyness: An introduction. In M. Bondi & M. Scott (Eds.), Keyness in Texts (pp. 1–20). John Benjamins.

Cheng, W. (2007). Concgramming: A corpus-driven approach to learning the phraseology of discipline-specific texts. CORELL: Computer Resources for Language Learning, 11, 22–35.

(2009). Income/interest/net: Using internal criteria to determine the aboutness of a text. In K. Aijmer (Ed.), Corpora and Language Teaching (pp. 157–177). John Benjamins.

Conway, M. (2010). Mining a corpus of biographical texts using keywords. Literary and Linguistic Computing, 25(1), 23–35.

El-Haj, M., Rayson, P., Piao, S., & Knight, J. (2018). Profiling medical journal articles using a gene ontology semantic tagger. In N. Calzolari et al. (Eds.), Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (pp. 4593–4597). European Language Resources Association (ELRA). [URL]

Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382.

Gabrielatos, C. (2018). Keyness analysis: Nature, metrics and techniques. In C. Taylor & A. Marchi (Eds.), Corpus Approaches to Discourse: A Critical Review (pp. 225–258). Routledge.

Gabrielatos, C., & Marchi, A. (2012, September 13–14). Keyness: Appropriate metrics and practical issues [Paper presentation]. Corpus-Assisted Discourse Studies International Conference, Bologna, Italy. [URL]

Hamilton, C., Adolphs, S., & Nerlich, B. (2007). The meanings of ‘risk’: A view from corpus linguistics. Discourse & Society, 18(2), 163–181.

Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773–795.

Kehoe, A., & Gee, M. (2011). Social Tagging: A new perspective on textual “aboutness”. Studies in Variation, Contacts and Change in English, 6(5). [URL]

Landis, J. R. & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174.

NCBI. (2018). PubMed. National Center for Biotechnology Information, U.S. National Library of Medicine. Bethesda MD, USA. [URL]

Nenkova, A., & McKeown, K. (2012). A survey of text summarization techniques. In C. Aggarwal & C. Zhai (Eds.), Mining Text Data (pp. 43–76). Springer.

Phillips, M. (1989). Lexical Structure of Text. Discourse Analysis Monographs: 12. English Language Research, University of Birmingham.

Plappert, G. (2017). Candidate knowledge? Exploring epistemic claims in scientific writing: A corpus-driven approach. Corpora, 12(3), 425–457.

Pojanapunya, P., & Todd, R. W. (2018). Log-likelihood and odds ratio: Keyness statistics for different purposes of keyword analysis. Corpus Linguistics and Linguistic Theory, 14(1), 133–167.

Rayson, P. (2008). From keywords to key semantic domains. International Journal of Corpus Linguistics, 13(4), 519–549.

(2016). Log-likelihood and effect size calculator [Excel spreadsheet]. [URL]

Saber, A. (2012). Phraseological patterns in a large corpus of biomedical articles. In A. Boulton, S. Carter-Thomas, & E. Rowley-Jolivet (Eds.), Corpus-informed Research and Learning in ESP: Issues and Applications (pp. 45–82). John Benjamins.

Scott, M. (1997). PC analysis of keywords – and key keywords. System, 25(2), 233–245.

(2001). Comparing corpora and identifying key words, collocations, and frequency distributions through the WordSmith Tools suite of computer programs. In M. Ghadessy, A. Henry, & R. L. Roseberry (Eds.), Small Corpus Studies and ELT (pp. 47–67). John Benjamins.

(2010). Problems in investigating keywords, or clearing the undergrowth and marking out trails… In Bondi, M. & Scott, M. (Eds.), Keyness in Texts (pp. 43–58). John Benjamins.

(2015). WordSmith Tools Manual: Consistency analysis. [URL]

(2019). WordSmith Tools (Version 7) [Computer software]. Lexical Analysis Software. [URL]

Scott, M., & Tribble, C. (2006). Textual Patterns: Key Words and Corpus Analysis in Language Education. John Benjamins.

Taylor, C. (2013). Searching for similarity using corpus-assisted discourse studies. Corpora, 8(1), 81–113.

(2018). Similarity. In C. Taylor, C. & A. Marchi (Eds.), Corpus Approaches to Discourse: A Critical Review (pp. 19–37). Routledge.

Williams, I. A. (2012). Self-reference in biomedical research article discussions: Further evidence for cross-cultural diversity in academic and scientific discourse. International Journal of Corpus Linguistics, 17(4), 546–583.

Wilson, A. (2013). Embracing Bayes factors for key item analysis in corpus linguistics. In M. Bieswanger & A. Koll-Stobbe (Eds.), New Approaches to the Study of Linguistic Variability (pp. 3–11). Peter Lang.

Cited by (4)

Cited by four other publications

Order by:

Fioravanti, Irene

2025. Connecting Corpus Linguistics and Psycholinguistics. In Exploration of the Intersection of Corpus Linguistics and Language Science, ► pp. 1 ff.

López-Rodríguez, Clara Inés

2022. Emotion at the end of life: Semantic annotation and key domains in a pilot study audiovisual corpus. Lingua 277 ► pp. 103401 ff.

Prentice, Sheryl, Paul Rayson, Jo Knight, Mahmoud El-Haj & Solly Elstein

2022. A Domain Based Approach to Semantic Lexicon Expansion. International Journal of Lexicography 35:3 ► pp. 364 ff.

Prentice, Sheryl & Paul J. Taylor

2021. Poles Apart? The Extent of Similarity Between Online Extremist and Non-extremist Message Content. Frontiers in Psychology 12

This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.