In:Digital and Internet-Based Research Methods in Applied Linguistics
Edited by Matt Kessler
[Research Methods in Applied Linguistics 15] 2026
► pp. 338–361
Chapter 16Automated text analyzers
Published online: 5 January 2026
https://doi.org/10.1075/rmal.15.16sha
https://doi.org/10.1075/rmal.15.16sha
Abstract
This chapter introduces (web-based) automated text analyzers (ATAs) in applied linguistics
research. It begins by briefly surveying key strands of research involving ATAs and outlining three types of text
analysis alongside questions commonly addressed by these tools. The core of this chapter presents a conceptual
framework for the typology and implementation of ATAs, structured around four continuum dimensions: (1)
pre-built corpus platform vs. custom corpus platform, (2) developer-oriented vs.
user-oriented, (3) focused vs. versatile, and (4) descriptive vs.
interpretive. The framework is illustrated through five practical studies showcasing the application of
various types of ATAs in applied linguistics research, including L2SCA, Coh-Metrix, Sketch Engine, #LancsBox, Voyant
Tools, and Wmatrix3. The chapter also discusses ethical considerations and methodological challenges associated with
ATA use. It concludes by outlining future directions for ATA development and research, including improving annotation
accuracy, enhancing qualitative interpretability, and expanding analytical capacities across languages and
modalities.
Article outline
- 1.Introduction
- 2.Frequently asked research questions
- Exploratory
- Predictive
- Inferential
- 3.Implementation
- Dimension 1: Pre-built corpus platform vs. custom corpus platform
- Dimension 2: User-oriented vs. developer-oriented tools
- Dimension 3: Focused vs. versatile function
- Dimension 4: Descriptive vs. interpretative focus
- 4.Example studies
- Kim and Lu (2024b)
- Polio and Yoon (2018)
- Taylor (2021)
- Elmas et al. (2025)
- Breeze (2019)
- 5.Ethics and research integrity considerations
- 6.Challenges and issues
- 7.Future research directions
References
References (36)
Alexopoulou, T., Michel, M., Murakami, A., & Meurers, D. (2017). Task
effects on linguistic complexity and accuracy: A large-scale learner corpus analysis employing natural
language processing techniques. Language
Learning, 67(S1), 180–208.
Anthony, L. (2022). What
can corpus software do? In A. O’Keeffe & M. McCarthy (Eds.), The
Routledge handbook of corpus linguistics (2nd
ed., pp. 103–125). Routledge.
Baker, P., & McEnery, A. (Eds.). (2015). Corpora
and discourse studies: Integrating discourse and corpora. Palgrave Macmillan.
Bednarek, M. (2015). Corpus-assisted
multimodal discourse analysis of television and film
narratives. In P. Baker & A. McEnery (Eds.), Corpora
and discourse studies: Integrating discourse and
corpora (pp. 63–87). Palgrave Macmillan.
Bengfort, B., Bilbro, R., & Ojeda, T. (2018). Applied
text analysis with Python: Enabling language-aware data products with machine
learning. O’Reilly Media.
Breeze, R. (2019). Emotion
in politics: Affective-discursive practices in UKIP and Labour. Discourse &
Society, 30(1), 24–43.
Brezina, V., & Platt, W. (2024). #LancsBox
X (Version 5.0.3) [Computer
Software]. Lancaster University. [URL]
Buck, A. M., & Ralston, D. F. (2021). I
didn’t sign up for your research study: The ethics of using “public”
data. Computers and
Composition, 61, 102655.
Chen, Y. H., & Baker, P. (2016). Investigating
criterial discourse features across second language development: Lexical bundles in rated learner essays, CEFR
B1, B2 and C1. Applied
linguistics, 37(6), 849–880.
Choi, J., & Crossley, S. A. (2022). Advanced
in readability research: Automated readability web app for
English. In Proceedings of the 2022 International
Conference on Advanced Learning
Technologies (pp. 1–5). IEEE.
Crossley, S. A., & Kim, M. (2022). Linguistic
features of writing quality and development: A longitudinal approach. The
Journal of Writing
Analytics, 6(1), 59–93.
Elmas, T., Yılmaz, F., & Gürbüz, N. (2025). “Refugees
from Ukraine are called humans”: A corpus-based critical discourse analysis of Turkish tweets about Ukrainian
refugees. Media, Culture &
Society, 47(1), 75–95.
Flowerdew, J., & Richardson, J. E. (Eds.). (2018). The
Routledge handbook of critical discourse
studies. Routledge.
Francom, J. (2025). An
introduction to quantitative text analysis for linguistics: Reproducible research using
R. Taylor & Francis.
Hardie, A. (2012). CQPweb
— combining power, flexibility and usability in a corpus analysis
tool. International Journal of Corpus
Linguistics, 17(3), 380–409.
Hunt, D., & Harvey, K. (2015). Health
communication and corpus linguistics: Using corpus tools to analyse eating disorder discourse
online. In P. Baker & A. McEnery (Eds.), Corpora
and discourse studies: Integrating discourse and
corpora (pp. 134–154). Palgrave Macmillan.
Jin, T., Lu, X., Guo, K., Li, B., Liu, F., Deng, Y., Wu, J., & Chen, G. (2021). Eng-Editor:
An online English text evaluation and adaptation
system. LanguageData. [URL]
Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., Rychlý, P., & Suchomel, V. (2014). The
Sketch Engine: Ten years
on. Lexicography, 1(1), 7–36.
Kim, M., & Lu, X. (2024a). Exploring
the potential of using ChatGPT for rhetorical move-step analysis: The impact of prompt refinement, few-shot
learning, and fine-tuning. Journal of English for Academic
Purposes, 71, 101422.
(2024b). L2
English speaking syntactic complexity: Data preprocessing issues, reliability of automated analysis, and the
effects of proficiency, L1 background, and topic. The Modern Language
Journal, 108(1), 270–296.
Kyle, K., Crossley, S., & Verspoor, M. (2021). Measuring
longitudinal writing development using indices of syntactic complexity and
sophistication. Studies in Second Language
Acquisition, 43(4), 781–812.
Lu, X. (2010). Automatic
analysis of syntactic complexity in second language writing. International
Journal of Corpus
Linguistics, 15(4), 474–496.
(2021). Directions
for future automated analyses of L2 written
texts. In The Routledge handbook of second language
acquisition and
writing (pp. 370–382). Routledge.
(2022). What
can corpus software reveal about language
development? In A. O’Keeffe & M. McCarthy (Eds.), The
Routledge handbook of corpus linguistics (2nd
ed.) (pp. 155–167). Routledge.
Mautner, G. (2022). What
can a corpus tell us about discourse? In A. O’Keeffe & M. McCarthy (Eds.), The
Routledge handbook of corpus linguistics (2nd
ed.) (pp. 250–262). Routledge.
McNamara, D. S., Graesser, A. C., McCarthy, P., & Cai, Z. (2014). Automated
evaluation of text and discourse with Coh-Metrix. Cambridge University Press.
O’Keeffe, A., & McCarthy, M. J. (Eds.). (2022). The
Routledge handbook of corpus linguistics (2nd
ed.). Routledge.
Polio, C., & Yoon, H. J. (2018). The
reliability and validity of automated tools for examining variation in syntactic complexity across
genres. International Journal of Applied
Linguistics, 28(1), 165–188.
Potts, A. (2015). Filtering
the flood: Semantic tagging as a method of identifying salient discourse topics in a large corpus of Hurricane
Katrina reportage. In P. Baker & A. McEnery (Eds.), Corpora
and discourse studies: Integrating discourse and
corpora (pp. 285–304). Palgrave Macmillan.
Sinclair, S., & Rockwell, G. (2016). Voyant-tools. [URL]
Srinivasa-Desikan, B. (2018). Natural
language processing and computational linguistics: A practical guide to text analysis with Python, Gensim,
spaCy, and Keras. Packt Publishing.
Taylor, C. (2021). Investigating
gendered language through collocation: The case of mock
politeness. In J. Angouri & J. Baxter (Eds.), The
Routledge handbook of language, gender, and
sexuality (pp. 572–586). Routledge.
