In:Beyond Concordance Lines: Corpora in language education
Edited by Pascual Pérez-Paredes and Geraldine Mark
[Studies in Corpus Linguistics 102] 2021
► pp. 9–34
Chapter 1Research in data-driven learning
Published online: 22 December 2021
https://doi.org/10.1075/scl.102.01bou
https://doi.org/10.1075/scl.102.01bou
Abstract
Data-driven learning (DDL) typically involves language learners consulting corpus data, either directly or via prepared materials, to answer questions about language. The approach has been mooted since the beginning of the modern era of corpus linguistics and has come to be associated with work by Tim Johns who coined the term in print in 1990. Since then, hundreds of studies have attempted to evaluate some aspect of DDL, giving rise to several reviews and syntheses. This paper introduces DDL and discusses the syntheses to date, before analysing a rigorous collection of 351 studies published up to and including 2018. While previous syntheses have evaluated the field, the objective here is to provide an overview of how researchers see DDL across the board, to identify more clearly what DDL actually looks like today, how it has evolved from its early beginnings in the 1980s, and to suggest avenues for future research in underexplored areas.
Keywords: data-driven learning, concordancing, synthesis
Article outline
- Data-driven learning
- Criticisms of DDL: Theoretical and empirical support
- Surveys of DDL
- Methodology
- Results and discussion
- Publication sources
- Language and geography
- Demographics
- Aims, corpora and tools
- Design
- Discussion and conclusion
Notes References
References (52)
Abu Alshaar, A., & Abuseileek, A. F. (2013). Using concordancing and word processing to improve EFL graduate students’ written English. JALT CALL Journal, 9(1), 59–77.
Al-Gamal, A. A. M., & Mohammed Ali, E. A. M. (2019). Corpus-based method in language learning and teaching. International Journal of Research and Analytical Reviews, 6(2), 473–476.
An, X.-H., & Xu, M.-Y. (2013). An empirical research on DDL in L2 writing. US-China Education Review A, 3(9), 693–701.
Anthony, L. (2019). AntConc [version 3.5.8m]. Tokyo: Waseda University. [URL]
Bernardini, S. (2000). Systematising serendipity: Proposals for concordancing large corpora with language learners. In L. Burnard & T. McEnery (Eds.), Rethinking language pedagogy from a corpus perspective (pp. 225–234). Peter Lang.
Boulton, A. (2008). But where’s the proof? The need for empirical evidence for data-driven learning. In M. Edwardes (Ed.), Technology, ideology and practice in applied linguistics (pp. 13–16). Scitsiugnil Press. Retrieved from [URL]
(2009). Data-driven learning: Reasonable fears and rational reassurance. Indian Journal of Applied Linguistics, 35(1), 81–106.
(2010). Learning outcomes from corpus consultation. In M. Moreno Jaén, F. Serrano Valverde, & M. Calzada Pérez (Eds.), Exploring new paths in language pedagogy: Lexis and corpus-based language teaching (pp. 129–144). Equinox.
(2011). Data-driven learning: The perpetual enigma. In S. Goźdź-Roszkowski (Ed.), Explorations across languages and corpora (pp. 563–580). Peter Lang.
(2012). Corpus consultation for ESP: A review of empirical research. In A. Boulton, S. Carter-Thomas, & E. Rowley-Jolivet (Eds.), Corpus-informed research and learning in ESP: Issues and applications (pp. 261–291). John Benjamins.
(2015). Applying data-driven learning to the web. In A. Leńko-Szymańska & A. Boulton (Eds.), Multiple affordances of language corpora for data-driven learning (pp. 267–295). John Benjamins.
Boulton, A., & Cobb, T. (2017). Corpus use in language learning: A meta-analysis. Language Learning, 67(2), 348–393.
Boulton, A., & Vyatkina, N. (2021). Thirty years of data-driven learning: Taking stock and charting new directions over time. Language Learning & Technology, 25(3).
Burston, J., & Arispe, K. (2018). Looking for a needle in a haystack: CALL and advanced language proficiency. Calico Journal, 35(1), 77–102.
Chambers, A. (2007). Popularising corpus consultation by language learners and teachers. In E. Hidalgo, L. Quereda, & J. Santana (Eds.), Corpora in the foreign language classroom (pp. 3–16). Rodopi.
(2019). Towards the corpus revolution? Bridging the research–practice gap. Language Teaching, 52(4), 460–475.
Chen, M., & Flowerdew, J. (2018). A critical review of research and practice in data-driven learning (DDL) in the academic writing classroom. International Journal of Corpus Linguistics, 23(3), 335–369.
Cobb, T., & Boulton, A. (2015). Classroom applications of corpus analysis. In D. Biber & R. Reppen (Eds.), Cambridge handbook of English corpus linguistics (pp. 478–497). Cambridge University Press.
Cresswell, A. (2007). Getting to ‘know’ connectors? Evaluating data-driven learning in a writing skills course. In E. Hidalgo, L. Quereda, & J. Santana (Eds.), Corpora in the foreign language classroom (pp. 267–287). Rodopi.
Crosthwaite, P., & Stell, A. (2019). It helps me get ideas on how to use my words: Primary school students’ initial reactions to corpus use in a private tutoring setting. In P. Crosthwaite (Ed.), Data-driven learning for the next generation: Corpora and DDL for pre-tertiary learners (pp. 150–170). Routledge.
Flowerdew, L. (2015). Data-driven learning and language learning theories: Whither the twain shall meet. In A. Leńko-Szymańska & A. Boulton (Eds.), Multiple affordances of language corpora for data-driven learning (pp. 15–36). John Benjamins.
Gilquin, G., & S. Granger. (2010). How can data-driven learning be used in language teaching? In A. O’Keeffe & M. McCarthy (Eds.), Routledge handbook of corpus linguistics (pp. 359–370). Routledge.
Han, Z. (2015). Striving for complementarity between narrative and meta-analytic reviews. Applied Linguistics, 36(3), 409–415.
Johns, T. (1988). Whence and whither classroom concordancing? In T. Bongaerts, P. de Haan, S. Lobbe, & H. Wekker (Eds.), Computer applications in language learning (pp. 9–27). Foris.
(1990). From printout to handout: Grammar and vocabulary teaching in the context of data-driven learning. CALL Austria, 10, 14–34.
(1991). Should you be persuaded: Two samples of data-driven learning materials. In T. Johns & P. King (Eds.), Classroom concordancing. English Language Research Journal, 4, 1–16.
Johns, T., & King, P. (1991). Editors’ preface. In T. Johns & P. King (Eds.), Classroom concordancing. English Language Research Journal, 4, iii–iv.
Johns, T. (1997). Contexts: the background, development and trialling of a concordance-based CALL program. In A. Wichmann, S. Fligelstone, T. McEnery & G. Knowles (Eds.), Teaching and language corpora (pp. 100–115). Addison Wesley Longman.
Johns, T., Lee, H., & Wang, L. (2008). Integrating corpus-based CALL programs and teaching English through children’s literature. Computer Assisted Language Learning, 21(5), 483–506.
Lee, H., Warschauer, M., & Lee, J. H. (2019). The effects of corpus use on second language vocabulary learning: A multilevel meta-analysis. Applied Linguistics, 40(5), 721–753.
Luo, Q. (2016). The effects of data-driven learning activities on EFL learners’ writing development. SpringerPlus, 5, n.p.
Ma, B. (1993). Small-corpora concordancing in ESL teaching and learning. Hong Kong Papers in Linguistics and Language Teaching, 16, 11–30.
McKay, S. (1980). Teaching the syntactic, semantic and pragmatic dimensions of verbs. TESOL Quarterly, 14(1), 17–26.
Mizumoto, A., & Chujo, K. (2015). A meta-analysis of data-driven learning approach in the Japanese EFL classroom. English Corpus Studies, 22, 1–18.
Pérez-Paredes, P. (2019). A systematic review of the uses and spread of corpora and data-driven learning in CALL research during 2011–2015. Computer Assisted Language Learning.
Plonsky, L., & Oswald, F. L. (2014). How big is ‘big’? Interpreting effect sizes in L2 research. Language Learning, 64(4), 878–912.
Plonsky, L., & Ziegler, N. (2016). The CALL–SLA interface: Insights from a second-order synthesis. Language Learning & Technology, 20(2), 17–37. 10125/44459
Römer, U. (2011). Corpus research applications in second language teaching. Annual Review of Applied Linguistics, 31, 205–225.
Shintani, N., Li, S., & Ellis, R. (2013). Comprehension-based versus production-based grammar instruction: A meta-analysis of comparative studies. Language Learning, 63(2), 296–329.
Cited by (13)
Cited by 13 other publications
Alamri, Basim & Assem Alqarni
Ballarè, Silvia, Claudia Borghetti, Paolo Della Putta, Caterina Mauri & Eleonora Zucchini
Flowerdew, Lynne
Kurt, Gökçe
Mahmoudi-Gahrouei, Vahid, Mariusz Kruk & Samira Atefi Boroujeni
Pérez-Paredes, Pascual & Alex Boulton
Pérez-Paredes, Pascual & Angela Chambers
2025. Corpora and the learning and teaching of French and Spanish. In Applying Corpora in Teaching and Learning Romance Languages [Studies in Corpus Linguistics, 122], ► pp. 12 ff.
Pérez-Paredes, Pascual, Niall Curry & Pilar Aguado Jiménez
Zasina, Adrian Jan
Şahin Kızıl, Aysel
Römer, Ute
Sarré, Cédric, Cédric Brudermann & Muriel Grosbois
2024. Using learner corpus data for grammatical accuracy development in written productions. International Journal of Learner Corpus Research 10:1 ► pp. 107 ff.
This list is based on CrossRef data as of 1 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
