In:Multiple Affordances of Language Corpora for Data-driven Learning
Edited by Agnieszka Leńko-Szymańska and Alex Boulton
[Studies in Corpus Linguistics 69] 2015
► pp. 267–296
Applying data-driven learning to the web
Published online: 13 May 2015
https://doi.org/10.1075/scl.69.13bou
https://doi.org/10.1075/scl.69.13bou
Data-driven learning typically involves the use of dedicated concordancers to explore linguistic corpora, which may require significant training if the technology is not to be an obstacle for teacher and learner alike. One possibility is to begin not with corpus or concordancer, but to find parallels with what ‘ordinary’ users already do. This paper compares the web to a corpus, regular search engines to concordancers, and the techniques used in web searches to data-driven learning. It also examines previous studies which exploit web searches in ways not incompatible with a DDL approach.
Keywords: data-driven learning, DDL, Google, Internet, search engines, web as corpus
References (113)
Acar, A., Geluso, J. & Shiki, T. 2011. How can search engines improve your writing? CALL-EJ 12(1): 1–10.
Adolphs, S. 2006. Introducing Electronic Text Analysis: A Practical Guide for Language and Literary Studies. London: Routledge.
Anthony, L. 2011. AntConc, version 3. Tokyo: Waseda University. <[URL]> (17 February 2013).
Aston, G. 1997. Small and large corpora in language learning. In Practical Applications in Language Corpora, B. Lewandowska-Tomaszczyk & J. Melia (eds), 51–62. Łódź: Łódź University Press.
Baroni, M. & Bernardini, S. (eds). 2006. Wacky! Working Papers on the Web as Corpus. Bologna: Gedit.
Bergh, G. 2005. Min(d)ing English language data on the web: What can Google tell us? ICAME Journal 29: 25–46.
Bernardini, S., Baroni, M. & Evert, S. 2006. A WaCky introduction. In Wacky! Working Papers on the Web as Corpus, M. Baroni & S. Bernardini (eds), 9–40. Bologna: Gedit.
Biber, D., Conrad, S. & Reppen, R. 1998. Corpus Linguistics: Investigating Language Structure and Use. Cambridge: CUP.
Boulton, A. 2010a. Data-driven learning: Taking the computer out of the equation. Language Learning 60(3): 534–572.
. 2010b. Data-driven learning: On paper, in practice. In Corpus Linguistics in Language Teaching, T. Harris & M. Moreno Jaén (eds), 17–52. Bern: Peter Lang.
. 2011a. Data-driven learning: The perpetual enigma. In Explorations across Languages and Corpora, S. Goźdź-Roszkowski (ed.), 563–580. Frankfurt: Peter Lang.
. 2011b. Bringing corpora to the masses: Free and easy tools for interdisciplinary language studies. In Corpora, Language, Teaching, and Resources: From Theory to Practice, N. Kübler (ed.), 69–96. Bern: Peter Lang.
. 2012. Hands-on/hands-off: Alternative approaches to data-driven learning. In Input, Process and Product: Developments in Teaching and Language Corpora, J. Thomas & A. Boulton (eds), 152–168. Brno: Masaryk University Press.
Boulton, A. & Tyne, H. 2014. Des Documents Authentiques aux Corpus: Démarches pour l’Apprentissage des Langues. Paris: Didier.
Braun, S. 2005. From pedagogically relevant corpora to authentic language learning contents. ReCALL 17(1): 47–64.
. 2010. Getting past ‘groundhog day’: Spoken multimedia corpora for student-centred corpus exploration. In Corpus Linguistics in Language Teaching, T. Harris & M. Moreno Jaén (eds), 75–97. Bern: Peter Lang.
Brezina, V. 2012. Use of Google Scholar in corpus-driven EAP research. Journal of English for Academic Purposes 11(4): 319–331.
Burnard, L. 2002. Where did we go wrong? A retrospective look at the British National Corpus. In Teaching and Learning by Doing Corpus Analysis, B. Kettemann & G. Marko (eds), 51–70. Amsterdam: Rodopi.
Buyse, K. & Verlinde, S. 2013. Possible effects of free on line data driven lexicographic instruments on foreign language learning: The case of Linguee and the Interactive Language Toolbox. Procedia: Social and Behavioral Sciences, 95: 507–512.
Chambers, A. & O’Sullivan, Í. 2004. Corpus consultation and advanced learners’ writing skills in French. ReCALL 16(1): 158–172.
Chang, J.-Y. 2010. Postsecondary EFL students’ evaluations of corpora with regard to English writing. SNU Journal of Education Research 19: 57–85. <[URL]> (11 April 2011).
Chinnery, G. 2008. You’ve got some GALL: Google-assisted language learning. Language Learning & Technology 12(1): 3–11.
Cobb, T. 2014. A resource wish-list for data-driven learning in French. In Ecological and Data-Driven Perspectives in French Language Studies, H. Tyne, V. André, A. Boulton, C. Benzitoun & Y. Greub (eds), 257–292. Newcastle upon Tyne: Cambridge Scholars.
Conroy, M. 2010. Internet tools for language learning: University students taking control of their writing. Australasian Journal of Educational Technology 26(6): 861–882.
Davies, M. 2009. The 385+ million word Corpus of Contemporary American English (1990-2008+): Design, architecture, and linguistic insights. International Journal of Corpus Linguistics 14(2): 159–188.
. 2013. Google Scholar and COCA-Academic: Two very different approaches to examining academic English. Journal of English for Academic Purposes 12: 155–165.
Dose, S. 2012. Scripted speech in the EFL classroom: The Corpus of American Television Series for teaching spoken English. In Input, Process and Product: Developments in Teaching and Language Corpora, J. Thomas & A. Boulton (eds), 103–121. Brno: Masaryk University Press.
Ferris, D. & Roberts, B. 2001. Error feedback in L2 writing classes: How explicit does it need to be? Journal of Second Language Writing 10: 161–184.
Fletcher, W. 2007. Concordancing the web: Promise and problems, tools and techniques. In Corpus Linguistics and the Web, M. Hundt, N. Nesselhauf & C. Biewer (eds), 25–45. Amsterdam: Rodopi.
Forchini, P. 2012. Movie Language Revisited: Evidence from Multi- Dimensional Analysis and Corpora. Frankfurt: Peter Lang.
Frankenberg-Garcia, A. 2014. How language learners can benefit from corpora, or not. Recherches en Didactique des Langues et des Cultures, 11(1): 93-110.
Franz, A. & Brants, T. 2006. All our n-gram are belong to you. Google Machine Translation Team Research Blog. <[URL]> (6 June 2012).
Gao, Z.-M. 2011. Exploring the effects and use of a Chinese-English parallel concordancer. Computer Assisted Language Learning 24(3): 255–275.
Gavioli, L. 2009. Corpus analysis and the achievement of learner autonomy in interaction. In Using Corpora to Learn about Language and Discourse, L. Lombardo (ed.), 39–71. Bern: Peter Lang.
Geiller, L. 2014. How EFL students can use Google to correct ‘untreatable’ written errors. Eurocall Review 22(2): 26-45.
Geluso, J. 2013. Phraseology and frequency of occurrence on the web: Native speakers’ perceptions of Google-informed second language writing. Computer Assisted Language Learning 26(2): 144–157.
Ghadessy, M., Henry, A. & Roseberry, R. (eds). 2001. Small Corpus Studies and ELT: Theory and Practice [Studies in Corpus Linguistics 5]. Amsterdam: John Benjamins.
Gilquin, G. & Granger, S. 2010. How can data-driven learning be used in language teaching? In The Routledge Handbook of Corpus Linguistics, A. O’Keeffe & M. McCarthy (eds), 359–370. London: Routledge.
Gilquin, G. & Gries, S. 2009. Corpora and experimental methods: A state-of-the-art review. Corpus Linguistics and Linguistic Theory 5(1): 1–26.
Hafner, C. & Candlin, C. 2007. Corpus tools as an affordance to learning in professional legal education. Journal of English for Academic Purposes 6(4): 303–318.
Hargittai, E., Fullerton, L., Menchen-Trevino, E. & Thomas, K. 2010. Trust on the web: How young adults judge the credibility of online content. International Journal of Communication 4: 468–494.
Hawkins, D. 1996. Hunting, grazing, browsing: A model for online information retrieval. ONLINE 20: n.p. <[URL]> (17 July, 2006 via <[URL]>).
Hoey, M. 2012. Lexical priming: The odd case of a psycholinguistic theory that generates corpus-linguistic hypotheses for both English and Chinese. Paper given at
Corpus Technologies and Applied Linguistics
. Suzhou: Xi’an Jiaotong Liverpool University, 28-30 June.
Huang, H.-T. & Liou, H.-C. 2007. Vocabulary learning in an automated graded reading program. Language Learning & Technology 11(3): 64–82.
Hundt, M., Nesselhauf, N. & Biewer, C. (eds). 2007. Corpus Linguistics and the Web. Amsterdam: Rodopi.
. 1988. Whence and whither classroom concordancing? In Computer Applications in Language Learning, P. Bongaerts, P. de Haan, S. Lobbe & H. Wekker (eds), 9–27. Dordrecht: Foris.
. 1990. From printout to handout: Grammar and vocabulary teaching in the context of data-driven learning. CALL Austria 10: 14–34.
. 1991. Should you be persuaded: Two examples of data-driven learning. In Classroom Concordancing, T. Johns & P. King (eds), English Language Research Journal 4: 1–16.
. 1997. Contexts: The background, development and trialling of a concordance-based CALL program. In Teaching and Language Corpora, A. Wichmann, S. Fligelstone, T. McEnery & G. Knowles (eds), 100–115. Harlow: Addison Wesley Longman.
Johns, T., Lee, H.-C. & Wang, L. 2008. Integrating corpus-based CALL programs in teaching English through children’s literature. Computer Assisted Language Learning 21(5): 483–506.
Joseph, B. 2004. The editor’s department: On change in Language and change in language. Language 80(3): 381–383.
Kaszubski, P. 2006. Web-based concordancing and ESAP writing. Poznań Studies in Contemporary Linguistics 41: 161–193.
Keller, F. & Lapata, M. 2003. Using the web to obtain frequencies for unseen bigrams. Computational Linguistics 29(3): 459–484.
Kennedy, C. & Miceli, T. 2001. An evaluation of intermediate students’ approaches to corpus investigation. Language Learning & Technology 5(3): 77–90.
Kilgarriff, A. 2001. Web as corpus. In Corpus Linguistics: Readings in a Widening Discipline, G. Sampson & D. McCarthy (eds), 471–473. London: Continuum.
. 2005. Language is never, ever, ever random. Corpus Linguistics and Linguistic Theory 1(2): 263–275.
Kübler, N. 2011. Working with corpora for translation teaching in a French-speaking setting. In New Trends in Corpora and Language Learning, A. Frankenberg-Garcia, L. Flowerdew & G. Aston (eds), 62–80. London: Continuum.
Lam, Y. 2000. Technophilia vs. technophobia: A preliminary look at why second-language teachers do or do not use technology in their classrooms. Canadian Modern Language Review 56(3): 390–420.
Laufer, B. & Hulstijn, J. 2001. Incidental vocabulary acquisition in a second language: The construct of task-induced involvement. Applied Linguistics 22(1): 1–26.
Leech, G. 1997. Teaching and language corpora: A convergence. In Teaching and Language Corpora, A. Wichmann, S. Fligelstone, T. McEnery & G. Knowles (eds), 1–23. Harlow: Addison Wesley Longman.
Leńko-Szymańska, A. 2014. Is this enough? A qualitative evaluation of the effectiveness of a teacher-training course on the use of corpora in language education. ReCALL 26(2): 260–278.
Littlemore, J. & Oakey, D. 2004. Communication with a purpose: Exploiting the Internet to promote language learning. In ICT and Language Learning: Integrating Pedagogy and Practice, A. Chambers, J. Conacher & J. Littlemore (eds), 95–119. Birmingham: University of Birmingham Press.
Lüdeling, A., Evert, S. & Baroni, M. 2007. Using web data for linguistic purposes. In Corpus Linguistics and the Web, M. Hundt, N. Nesselhauf & C. Biewer (eds), 7–24. Amsterdam: Rodopi.
McCarthy, M. 2008. Accessing and interpreting corpus information in the teacher education context. Language Teaching 41(4): 563–574.
McEnery, T., Xiao, R. & Tono, Y. 2006. Corpus-Based Language Studies: An Advanced Resource Book. London: Routledge.
Milton, J. 2006. Resource-rich web-based feedback: Helping learners become independent writers. In Feedback in Second Language Writing: Contexts and Issues, K. Hyland & F. Hyland (eds), 123–137. Cambridge: CUP.
Mondorf, B. 2007. Recalcitrant problems of comparative alternation and new insights emerging from Internet data. In Corpus Linguistics and the Web, M. Hundt, N. Nesselhauf & C. Biewer (eds), 211–232. Amsterdam: Rodopi.
O’Sullivan, Í. & Chambers, A. 2006. Learners’ writing skills in French: Corpus consultation and learner evaluation. Journal of Second Language Writing 15(1): 49–68.
Park, K. 2012. Learner-corpus interaction: A locus of microgenesis in corpus-assisted L2 writing. Applied Linguistics 33(4): 361–385.
Park, K. & Kinginger, C. 2010. Writing/thinking in real time: Digital video and corpus query analysis. Language Learning & Technology 14(3): 31–50.
Pérez-Paredes, P., Sánchez Tornel, M., Alcaraz Calero, J. & Aguada Jiménez, P. 2011. Tracking learners’ actual uses of corpora: Guided vs. non-guided corpus consultation. Computer Assisted Language Learning 24(3): 233–253.
Philip, G. 2011. ‘…and I dropped my jaw with fear’: The role of corpora in teaching phraseology. In Corpora, Language, Teaching, and Resources: From Theory to Practice, N. Kübler (ed.), 49–68. Bern: Peter Lang.
Quaglio, P. 2009. Television Dialogue: The Sitcom Friends vs. Natural Conversation [Studies in Corpus Linguistics 36]. Amsterdam: John Benjamins.
Renouf, A., Kehoe, A. & Banerjee, J. 2007. WebCorp: An integrated system for web text search. In Corpus Linguistics and the Web, M. Hundt, N. Nesselhauf & C. Biewer (eds), 47–67. Amsterdam: Rodopi.
Robb, T. 2003. Google as a quick ‘n’ dirty corpus tool. TESL-EJ 7(2): n.p. <[URL]> (1 July, 2007).
Rodgers, O., Chambers, A. & LeBaron, F. 2011. Corpora in the LSP classroom: A learner-centred corpus of French for biotechnologists. International Journal of Corpus Linguistics 16(3): 392–358.
Rohdenburg, G. 2007. Determinants of grammatical variation in English and the formation/confirmation of linguistic hypotheses by means of Internet data. In Corpus Linguistics and the Web, M. Hundt, N. Nesselhauf & C. Biewer (eds), 191–209. Amsterdam: Rodopi.
Römer, U. 2010. Using general and specialised corpora in English language teaching: Past, present and future. In Corpus-based Approaches to English Language Teaching, M.-C. Campoy, B. Bellés-Fortuño & M.-L. Gea-Valor (eds), 18–35. London: Continuum.
Rosenbach, A. 2007. Exploring constructions on the web: A case study. In Corpus Linguistics and the Web, M. Hundt, N. Nesselhauf & C. Biewer (eds), 67–190. Amsterdam: Rodopi.
Rundell, M. 2000. The biggest corpus of all. Humanising Language Teaching 2(3): n.p. <[URL]> (7 June 2012).
Scheffler, P. 2007. When intuition fails us: The world wide web as a corpus. Glottodidactica 33: 137–145.
Sha, G. 2010. Using Google as a super corpus to drive written language learning: A comparison with the British National Corpus. Computer Assisted Language Learning 23(5): 377–393.
Sharoff, S. 2006. Creating general-purpose corpora using automated search engine queries. In WaCKy! Working Papers on the Web as Corpus, M. Baroni & S. Bernardini (eds), 63–98. Bologna: Gedit.
Shei, C. 2008a. Web as corpus, Google, and TESOL: A new trilogy. Taiwan Journal of TESOL 5(2): 1–28.
. 2008b. Discovering the hidden treasure on the Internet: Using Google to uncover the veil of phraseology. Computer Assisted Language Learning 21(1): 67–85.
Sinclair, J. 2001. Preface. In Small Corpus Studies and ELT: Theory and Practice [Studies in Corpus Linguistics 5], M. Ghadessy, A. Henry & R. Roseberry (eds), vii–xv. Amsterdam: John Benjamins.
. (ed.). 2004. How to Use Corpora in Language Teaching [Studies in Corpus Linguistics 12]. Amsterdam: John Benjamins.
. 2005. Corpus and text: Basic principles. / Appendix: How to build a corpus. In Developing Linguistic Corpora: A Guide to Good Practice, M. Wynne (ed.), 5–24 / 95–101. Oxford: Oxbow Books.
Smith, S. 2011. Learner construction of corpora for general English in Taiwan. Computer Assisted Language Learning 24(4): 291–316.
Sockett, G. & Toffoli, D. 2012. Beyond learner autonomy: A dynamic systems view of the informal learning of English in virtual online communities. ReCALL 24(2): 138–151.
Stewart, D., Bernardini, S. & Aston, G. 2004. Ten years of TaLC. In Corpora and Language Learners [Studies in Corpus Linguistics 17], G. Aston, S. Bernardini & D. Stewart (eds), 1–18. Amsterdam: John Benjamins.
Sun, Y.-C. 2007. Learner perceptions of a concordancing tool for academic writing. Computer Assisted Language Learning 20(4): 323–343.
Tyne, H. 2012. Corpus work with ordinary teachers: Data-driven learning activities. In Input, Process and Product: Developments in Teaching and Language Corpora, J. Thomas & A. Boulton (eds), 136–151. Brno: Masaryk University Press.
Volk, M. 2002. Using the web as corpus for linguistic research. In Tähendusepüüdja: Catcher of the Meaning – A Festschrift for Professor Halduur Oim, R. Pajusalu & T. Hennoste (eds), n.p. Tartu: University of Tartu. <[URL]> (25 March 2006).
Willis, J. 1998. Concordances in the classroom without a computer. In Materials Development in Language Teaching, B. Tomlinson (ed.), 44–66. Cambridge: CUP.
Wu, S., Franken, M. & Witten, I. 2009. Refining the use of the web (and web search) as a language teaching and learning resource. Computer Assisted Language Learning 22(3): 249–268.
Cited by (16)
Cited by 16 other publications
Chang, Li-Ping & Chun-Ting Chou
Gatto, Maristella
Mahmoudi-Gahrouei, Vahid, Mariusz Kruk & Samira Atefi Boroujeni
Bhattacharya, Atanu, Vikas Sharma, Sachin Kumar Pandey & Alok Prasad
Crosthwaite, Peter & Brett Steeples
Zare, Javad, Sedigheh Karimpour & Khadijeh Aqajani Delavar
Zare, Javad & Sedigheh Karimpour
Mohanachandran, Dileep Kumar, Cheng Tat Yap, Zohr Ismaili & Normala S. Govindarajo
Ben Amor, Olfa & Faiza Derbel
Charles, Maggie
Whyte, Shona
Boulton, Alex & Tom Cobb
Boulton, Alex
2021. Research in data-driven learning. In Beyond Concordance Lines [Studies in Corpus Linguistics, 102], ► pp. 9 ff.
This list is based on CrossRef data as of 1 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
