In:Technology and Instructed Second Language Acquisition: Connecting research and pedagogy
Edited by Shawn Loewen, Frederick J. Poole, Hyun-Bin Hwang and Matthew D. Coss
[Language Learning & Language Teaching 63] 2025
► pp. 115–138
Chapter 5Corpora and instructed second language acquisition
A useful methodological synergy?
Published online: 27 October 2025
https://doi.org/10.1075/lllt.63.05cro
https://doi.org/10.1075/lllt.63.05cro
Abstract
The use of language corpora for language teaching and learning, typically known as ‘data-driven learning’
(DDL, Johns, 1991) when used within instructed second language acquisition (ISLA)
contexts, continues to be a productive area of applied linguistic research. However, while using language corpus data to learn
(and teach) language might appear to be “a useful methodological synergy” (Baker et al.,
2008), penetration of DDL and corpus-based language pedagogy into mainstream language education is still limited,
and — given the current generative AI era we now find ourselves in — is at risk of being overshadowed by new innovations in
technology. This chapter outlines the continued case for using corpora in the classroom, outlining cutting-edge DDL corpus
tools and pedagogical resources, theories underpinning what makes DDL ‘work’ in ISLA contexts, exemplary DDL studies, and
areas ripe for DDL that are yet to be explored.
Article outline
- Introduction
- Overview and typology of corpus tools for L2 learning
- Theories and research
- Theoretical perspectives
- ISLA-focused research
- Exemplary corpus and instructed SLA study
- Generalizability and representativeness
- Pedagogical applications and considerations
- Recommendation 1 — right tool, right data
- Recommendation 2 — use open educational resources
- Recommendation 3 — Drop the concordances (if necessary)
- Pedagogical scenario for reflection
- Extension resources
- Reflection questions
- Topics for (action) research projects
- Effectiveness of corpus-based instruction
- Error analysis
- Vocabulary acquisition
- Generative AI use acknowledgement
References
References (77)
Anthony, L. (2005, March). AntConc:
A learner and classroom friendly, multi-platform corpus analysis
toolkit. In IWLeL 2004: An interactive workshop on language
e-learning (pp. 7–13). Waseda University.
(2013). A
critical look at software tools in corpus linguistics. Linguistic
Research, 30(2), 141–161.
(2021). Programming
for corpus linguistics. In M. Paquot & S. T. Gries (Eds.), A
practical handbook of corpus
linguistics (pp. 181–207). Springer.
(2022). What
can corpus software do? In A. O’Keeffe & M. J. McCarthy (Eds.), The
Routledge handbook of corpus
linguistics (pp. 103–125). Routledge.
Aşık, A. (2017). A
sample corpus integration in language teacher education through coursebook
evaluation. Journal of Language and Linguistic
Studies, 13(2), 728–740.
Baker, P., Gabrielatos, C., Khosravinik, M., Krzyżanowski, M., McEnery, T., & Wodak, R. (2008). A
useful methodological synergy? Combining critical discourse analysis and corpus linguistics to examine discourses of
refugees and asylum seekers in the UK press. Discourse &
society, 19(3), 273–306.
Bednarek, M., Schweinberger, M., & Lee, K. (2024). Corpus-based
discourse analysis: From meta-reflection to accountability. Corpus Linguistics and
Linguistic
Theory, 20(3), 539–566.
Boontam, P., Phoocharoensil, S. (2018). Effectiveness
of English preposition learning through data-driven learning (DDL). Southeast Asian J.
Eng. Lang.
Stud., 24(3), 125–141.
Boulton, A. (2010). Learning
outcomes from corpus consultation. In M. Moreno Jaén, F. Serrano Valverde, & M. Calzada Pérez (Eds.), Exploring
new paths in language pedagogy: Lexis and corpus-based language
teaching (pp. XX–XX). Equinox.
Boulton, A., & Cobb, T. (2017). Corpus
use in language learning: a meta-analysis. Language
Learning 67(2), 348–393.
Boulton, A., & Tyne, H. (2013). Corpus
linguistics and data-driven learning: A critical overview. Bulletin Suisse de
Linguistique
Appliquée, 97, 97–118.
Boulton, A., & Vyatkina, N. (2021). Thirty
years of data-driven learning: Taking stock and charting new directions over
time. Language Learning &
Technology, 25(3), 66–89. [URL]
Braun, S. (2006). ELISA:
A pedagogically enriched corpus for language learning
purposes. In S. Braun, K. Kohn, & J. Mukherjee (Eds.), Corpus
technology and language
pedagogy (pp. 25–47). Peter Lang.
Breyer, Y. (2009). Learning
and teaching with corpora: Reflections by student teachers. Computer Assisted Language
Learning, 22(2), 153–172.
Bybee, J. (2008). Usage-based
grammar and second language acquisition. In P. Robinson & N. C. Ellis (Eds.), Handbook
of cognitive linguistics and second language
acquisition (pp. 226–246). Routledge.
Callies, M. (2019). Integrating
corpus literacy into language teacher education: The case of learner
corpora. In S. Götz & J. Mukherjee (Eds.), Learner
corpora and language
teaching (pp. 245–264). John Benjamins.
Chung, T. T., Crosthwaite, P., Cao, C. T. H., & de Carvalho, C. T. (2024). Walking
the walk? (Mis) alignment of EFL teachers’ self-reported corpus literacy skills and their competence in planning and
implementing corpus-based language pedagogy. TESOL
Quarterly, 58(1).
Cobb, T. (1999). Breadth
and depth of lexical acquisition with hands-on concordancing. Computer Assisted
Language
Learning, 12(4), 345–360.
Crosthwaite, P. (Ed.). (2019). Data-driven
learning for the next generation: Corpora and DDL for pre-tertiary
learners. Routledge.
Crosthwaite, P., & Anthony, L. (2024, March 28). Methodological
considerations in the selection and design of corpus tools. Presentation given at
the BAAL Corpus Linguistics SIG Symposium, March 27,
2024, Edinburgh University, UK. Retrieved
on 16 May
2025 from [URL]
Crosthwaite, P., & Baisa, V. (2024). A
user-friendly corpus tool for disciplinary data-driven learning: Introducing
CorpusMate. International Journal of Corpus
Linguistics, 29(4), 595–610.
Crosthwaite, P. and Boulton, A. (2023). DDL
is dead? Long live DDL! Expanding the boundaries of data-driven
learning. In Tyne, H. (eds) Discovering
language: Learning and affordance. Routledge.
Crosthwaite, P., Ningrum, S., & Schweinberger, M. (2023). Research
trends in corpus linguistics: A bibliometric analysis of two decades of Scopus-indexed corpus linguistics research in
arts and humanities. International Journal of Corpus
Linguistics, 28(3), 344–377.
Crosthwaite, P., Luciana, L., & Wijaya, D. (2021). Voices
from the periphery: Perceptions of Indonesian primary vs secondary pre-service teacher trainees about corpora and
data-driven learning in the L2 English classroom. Applied Corpus
Linguistics, 1(1), 100003.
Crosthwaite, P., Wong, L. L., & Cheung, J. (2019). Characterising
postgraduate students’ corpus query and usage patterns for disciplinary data-driven
learning. ReCALL, 31(3), 255–275.
Davies, M. (2019, July). The
best of both worlds: Multi-billion word “dynamic”
corpora. In Proceedings of the Workshop on Challenges in the
Management of Large Corpora
(CMLC-7) (pp. 23–28).
(2023). Creating
and using “virtual corpora” to extract and analyse domain-specific vocabulary at
English-Corpora.org. In J. Pan & S. Laviosa (Eds.), Corpora
and Translation Education: Advances and
Challenges (pp. 89–108). Springer.
(2021). The
TV and Movies corpora: Design, construction, and use. International Journal of Corpus
Linguistics, 26(1), 10–37.
Farr, F. (2008). Evaluating
the use of corpus-based instruction in a language teacher education context: Perspectives from the
users. Language
Awareness, 17(1), 25–43.
Flowerdew, L. (2015). Data-driven
learning and language learning theories. In A. Leńko-Szymańska & A. Boulton (Eds.), Multiple
affordances of language corpora for data-driven
learning (pp. 15–36). John Benjamins.
Gablasova, D., & Brezina, V. (2015). Does
speaker role affect the choice of epistemic adverbials in L2 speech? Evidence from the Trinity Lancaster
Corpus. Yearbook of Corpus Linguistics and Pragmatics 2015: Current Approaches to
Discourse and Translation Studies, 117–136.
Green, C. (2017). Introducing
the Corpus of the Canon of Western Literature: A corpus for culturomics and
stylistics. Language and
Literature, 26(4), 282–299.
Gries, S. T., & Wulff, S. (2009). Psycholinguistic
and corpus-linguistic evidence for L2 constructions. Annual Review of Cognitive
Linguistics, 7(1), 163–186.
Heather, J., & Helt, M. (2012). Evaluating
corpus literacy training for pre-service language teachers: Six case studies. Journal
of Technology and Teacher
Education, 20(4), 415–440.
Hsiao, J. C., & Chang, J. S. (2023). Enhancing
EFL reading and writing through AI-powered tools: design, implementation, and evaluation of an online
course. Interactive Learning
Environments, 32(9), 4934–4949.
Johns, T. (1991). Should
you be persuaded — Two samples of data-driven learning materials. Retrieved
on 16 May
2025 from [URL]
Jones, C., & Oakey, D. (2024). Learners’
perceived development of spoken grammar awareness after corpus-informed instruction: An exploration of learner
diaries. TESOL
Quarterly, 58(3), 1138–1165.
Karras, J. N. (2016). The
effects of data-driven learning upon vocabulary acquisition for secondary international school students in
Vietnam. ReCALL, 28(2), 166–186.
Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., Rychlý, P., & Suchomel, V. (2014). The
Sketch Engine: Ten years
on. Lexicography, 1(1), 7–36.
Kilgarriff, A., Marcowitz, F., Smith, S., & Thomas, J. (2015). Corpora
and language learning with the Sketch Engine and SKELL. Revue Française de Linguistique
Appliquée, 20(1), 61–80.
Kortmann, B. (2021). Reflecting
on the quantitative turn in
linguistics. Linguistics, 59(5), 1207–1226.
Kytö, M. (2011). Corpora
and historical linguistics. Revista Brasileira de Linguística
Aplicada, 11, 417–457.
Lai, S. L. & Chen, H. J. H. (2015). Dictionaries
vs concordancers: Actual practice of the two different tools in EFL writing. Computer
Assisted Language
Learning, 28(4), 341–363. Retrieved 16 April 2024 from [URL].
Le Foll, E. (2021). Creating
corpus-informed materials for the English as a foreign language classroom. UQ Pressbooks.
(2022). Why
we need open science and open education to bridge the corpus research–practice
gap. In R. R. Jablonkai & E. Csomay (Eds.), Corpora
for language
learning (pp. 142–156). Routledge.
Leńko-Szymańska, A. (2017). Training
teachers in data-driven learning: Tackling the challenge. Language Learning &
Technology, 21, 217–241.
Liao, S., & Lei, L. (2017). What
we talk about when we talk about corpus: A bibliometric analysis of corpus-related research in linguistics
(2000–2015). Glottometrics, 38, 1–20.
Lozano, A. A., Izquierdo, J. (2019). Technology
in second language education: Overcoming the digital divide. Emerging Trends in
Education. 2 (3), 52–70.
Lusta, A., Demirel, Ö., & Mohammadzadeh, B. (2023). Language
corpus and data-driven learning (DDL) in language classrooms: A systematic
review. Heliyon, 12(e22731).
Ma, Q., Tang, J., & Lin, S. (2022). The
development of corpus-based language pedagogy for TESOL teachers: A two-step training approach facilitated by online
collaboration. Computer Assisted Language
Learning, 35(9), 2731–2760.
Meunier, F. (2019). A
case for constructive alignment in DDL: Rethinking outcomes, practices, and assessment in (data-driven) language
learning. In P. Crosthwaite (Ed.) Data-driven
learning for the next
generation (pp. 13–30). Routledge.
Mishan, F. (2004). Authenticating
corpora for language learning: A problem and its resolution. ELT
journal, 58(3), 219–227.
Mukherjee, J. (2006). Corpus
linguistics and language pedagogy: The state of the art — And
beyond. In S. Braun, K. Kohn, & J. Mukherjee (Eds.), Corpus
technology and language pedagogy: New resources, new tools, new
methods (pp. 5–24). Peter Lang.
O’Keeffe, A. (2021). Data-driven
learning, theories of learning and second language acquisition, in search of
intersections. In P. Pérez-Paredes & G. Mark (Eds.), Beyond
Concordance Lines: Corpora in Language
Education (pp. 35–55). John Benjamins.
Özbay, A. S., & Olgun, O. (2017). The
application of DDL for teaching preposition collocations to Turkish EFL
learners. International Journal Research of Teacher
Education, 8(3), 1–10.
Pérez-Paredes, P. (2019). The
pedagogic advantage of teenage corpora for secondary school
learners. In P. Crosthwaite (Ed.), Data-driven
learning for the next
generation (pp. 67–87). Routledge.
Pinto, P. T., Crosthwaite, P., de Carvalho, C. T., Spinelli, F., Serpa, T., Garcia, W., & Ottaiano, A. O. (2023). Using
language data to learn about language: A teachers’ guide to classroom corpus use. UQ Pressbooks.
Poole, F. J., & Toda Cosi, M. (2025). Generative
AI in instructed SLA: Bridging AI literacies with
pedagogy. In S. Loewen, F. Poole, H. Hwang, & M. D. Coss (Eds.), Technology
and instructed second language acquisition: Connecting research and pedagogy (pp.
163–184). John Benjamins. (this
volume)
Roebers, C. M. (2017). Executive
function and metacognition: Towards a unifying framework of cognitive
self-regulation. Developmental
Review, 45, 31–51.
Römer, U. (2011). Corpus
research applications in second language teaching. Annual Review of Applied
Linguistics, 31, 205–225.
(2023). Usage-based
approaches to second language acquisition vis-à-vis data-driven learning. TESOL
Quarterly, 58(3), 1235–1245.
Saeedakhtar, A., Bagerin, M., Abdi, R. (2020). The
effect of hands-on and hands-off data–driven learning on low-intermediate learners’ verb-preposition
collocations. System, 102268.
Schmidt, R. W. (1990). The
role of consciousness in second language learning. Applied
Linguistics, 11(2), 129–158.
Schweinberger, M. (2024). Collocation
tool. Retrieved from [URL]
Schweinberger, M., & Haugh, M. (forthcoming). Reproducibility
and transparency in interpretive corpus pragmatics. International Journal of Corpus
Linguistics.
Shulman, L. (1987). Knowledge
and teaching: Foundations of the new reform. Harvard Educational
Review, 57(1), 1–23.
Siepmann, D. (2015). Dictionaries
and spoken language: A corpus-based review of French dictionaries. International
Journal of
Lexicography, 28(2), 139–168.
Soruç, A., Tekin, B. (2017). Vocabulary
learning through data-driven learning in an English as a second language
setting. Educational Science, Theory
Practice 17(6), 1811–1832.
Taghizadeh, M., & Hasani Yourdshahi, Z. (2019). Integrating
technology into young learners’ classes: Language teachers’ perceptions. Computer
Assisted Language
Learning, 33(8), 982–1006.
Ueno, S., & Takeuchi, O. (2023). Effective
corpus use in second language learning: A meta-analytic approach. Applied Corpus
Linguistics, 3(3), 100076.
Cited by (1)
Cited by one other publication
Montero-Pérez, Maribel
2025. The impact of multimodal input tools in instructed second language acquisition (and beyond). In Technology and Instructed Second Language Acquisition [Language Learning & Language Teaching, 63], ► pp. 139 ff.
This list is based on CrossRef data as of 29 november 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
