Article published In: Linguistic Innovations: Rethinking linguistic creativity in non-native Englishes
Edited by Sandra C. Deshors, Sandra Götz and Samantha Laporte
[International Journal of Learner Corpus Research 2:2] 2016
► pp. 177–204
Detecting innovations in a parsed corpus of learner English
Published online: 20 October 2016
https://doi.org/10.1075/ijlcr.2.2.03sch
https://doi.org/10.1075/ijlcr.2.2.03sch
In research on L2 English, recent corpus-based studies indicate that some non-standard forms are shared by indigenized (ESL) and foreign (EFL) varieties of English, which challenges the idea of a clear dichotomy between innovation and error. We present a data-driven large-scale method to detect innovations, test it on verb + preposition structures (including phrasal verbs) and adjective + preposition structures, and describe similarities and differences between EFL and ESL. We use a dependency-parsed version of the International Corpus of Learner English to automatically extract potential innovations, defined as patterns of overuse compared to the British National Corpus as reference corpus. We measure overuse by means of collocation measures like O/E or T-score, and compare our results with similar results for ESL. In both quantitative and qualitative analyses, we detect similarities between the two varieties (e.g. discuss about) and dissimilarities (e.g. accuse for, only distinctive for EFL). We report more verb/adjective + preposition combinations than previous studies and discuss the roles of analogy and transfer.
References (53)
Aston, G. & Burnard, L. 1998. The BNC Handbook. Exploring the British National Corpus with SARA. Edinburgh: Edinburgh University Press.
Benson, M., Benson, E. & Ilson, R. 2009. The BBI Combinatory Dictionary of English (3rd ed.). Amsterdam: John Benjamins.
Cornell, A. 1985. “Realistic goals in teaching and learning phrasal verbs”, International Review of Applied Linguistics in Language Teaching (IRAL) 23(4), 269–280.
Davies, M. & Fuchs, R. 2015. “Expanding horizons in the study of World Englishes with the 1.9 billion word Global Web-Based English Corpus (GloWbE)”, English World-Wide 36(1), 1–28.
Deshors, S.C. 2016. “Inside phrasal verb constructions: A co-varying collexeme analysis of verb-particle combinations in EFL and their semantic associations”, International Journal of Learner Corpus Research 2(1), 1–30.
Díaz-Negrillo, A., Ballier, N. & Thompson, P. (Eds.). 2013. Automatic Treatment and Analysis of Learner Corpus Data. Studies in Corpus Linguistics 59. Amsterdam: John Benjamins.
Dickinson, M. & Ragheb, M. 2009. “Dependency annotation for learner corpora”. In Proceedings of the
Eighth Workshop on Treebanks and Linguistic Theories (TLT)
. Milan, Italy.
Edwards, A. 2014. “The EFL-ESL continuum and the case of the Netherlands: A comparative analysis of the progressive aspect”, World Englishes 331, 173–194.
Edwards, A. & Laporte, S. 2015. “Outer and expanding circle Englishes. The competing roles of norm orientation and proficiency levels”, English World-Wide 36(2), 135–169.
Evert, S. 2008. “Corpora and collocations”. In A. Lüdeling & M. Kytö (Eds.), Corpus Linguistics. An International Handbook. Berlin: de Gruyter, 1212–1248.
Fuchs, R. & Wunder, E.-M. 2015. “A sonority-based account of speech rhythm in Chinese learners of English”. In U. Gut, R. Fuchs & E.-M. Wunder (Eds.), Universal or Diverse Paths to English Phonology? Bridging the Gap between Research on Phonological Acquisition of English as a Second, Third or Foreign Language. Berlin: de Gruyter, 165–184.
Gardner, D. & Davies, M. 2007. “Pointing out frequent phrasal verbs: A corpus-based analysis”, TESOL Quarterly: A Journal for Teachers of English to Speakers of Other Languages and of Standard English as a Second Dialect 41(2), 339–359.
Gilquin, G. 2011. “Corpus linguistics to bridge the gap between World Englishes and Learner Englishes”. In L. Ruiz Miyares & M.R. Álvarez Silva (Eds.), Comunicación Social en el Siglo XXI, Vol. II1. Santiago de Cuba: Centro de Lingüística Aplicada, 638–642.
. 2015a. “At the interface of contact linguistics and second language acquisition research: New Englishes and Learner Englishes compared”, English World-Wide 36(1), 91–124.
. 2015b. “The use of phrasal verbs by French-speaking EFL learners. A constructional and collostructional corpus-based approach”, Corpus Linguistics and Linguistic Theory 11(1), 51–88.
. To appear. “Applied cognitive linguistics and second/foreign language varieties: Towards an explanatory account”. In E. Tribushinina, J. Evers-Vermeul & L. Rasier (Eds.), Usage-based Approaches to Language Acquisition and Language Teaching. Berlin: de Gruyter.
Gilquin, G. & Granger, S. 2011. “From EFL to ESL: Evidence from the International Corpus of Learner English”. In J. Mukherjee & M. Hundt (Eds.), Exploring Second-Language Varieties of English and Learner Englishes: Bridging a Paradigm Gap. Amsterdam: John Benjamins, 55–78.
Götz, S. 2015. “Fluency in ENL, ESL and EFL: A corpus-based pilot study”. In Proceedings of Disfluency in Spontaneous Speech, DISS 2015. Glasgow, UK. Available at: [URL] (accessed April 2016).
Götz, S. & Schilk, M. 2011. “Formulaic sequences in spoken ENL, ESL and EFL: Focus on British English, Indian English and learner English of advanced German learners”. In J. Mukherjee & M. Hundt (Eds.), Exploring Second-Language Varieties of English and Learner Englishes: Bridging a Paradigm Gap. Amsterdam: John Benjamins, 79–100.
Granger, S. 2009. “Prefabricated patterns in advanced EFL writing: Collocations and formulae”. In A.P. Cowie (Ed.), Phraseology: Theory, Analysis, and Applications. Oxford: Oxford University Press, 185–204.
Granger, S., Dagneaux, E., Meunier, F. & Paquot, M. 2009. International Corpus of Learner English. Version 2 (Handbook + CD-ROM). Louvain-la-Neuve: Presses universitaires de Louvain.
Gries, S.T. & Wulff, S. 2005. “Do foreign language learners also have constructions? Evidence from priming, sorting, and corpora”, Annual Review of Cognitive Linguistics 31, 182–200.
. 2009. “Psycholinguistic and corpus linguistic evidence for L2 constructions”, Annual Review of Cognitive Linguistics 71, 163–186.
Gut, U. 2011. “Studying structural innovations in New English varieties”. In J. Mukherjee & M. Hundt (Eds.), Exploring Second-Language Varieties of English and Learner Englishes: Bridging a Paradigm Gap. Amsterdam: John Benjamins, 100–124.
Gut, U., Fuchs, R. & Wunder, E.-M. (Eds.). 2015. Universal or Diverse Paths to English Phonology. Berlin: de Gruyter.
Jurafsky, D. & Martin, J.H. 2009. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition (2nd ed.). Upper Saddle River, NJ: Prentice Hall.
Laporte, S. 2012. “Mind the gap! bridge between world Englishes and learner Englishes in the making”, English Text Construction 5(2), 265–292.
Lehmann, H.M. & Schneider, G. 2011. “A large-scale investigation of verb-attached prepositional phrases”. In S. Hoffmann, P. Rayson & G. Leech (Eds.), Studies in Variation, Contacts and Change in English, Volume 6: Methodological and Historical Dimensions of Corpus Linguistics. Varieng, Helsinki. Available at: [URL] (accessed April 2016).
. 2012. “Dependency Bank”. In Proceedings of
LREC 2012 Workshop on Challenges in the Management of Large Corpora
, 23–28.
Mukherjee, J. 2005. “All mine, mine alone…”. Emerging local norms in Indian English lexico-grammar. Paper presented at the University of Zurich.
. 2007. “Steady states in the evolution of New Englishes: Present-day Indian English as an equilibrium”, Journal of English Linguistics 35(2), 157–187.
Mukherjee, J. & Hoffmann, S. 2006. “Describing verb-complementational profiles of New Englishes: A pilot study of Indian English”, English World-Wide 27(2), 147–173.
Mukherjee, J. & Hundt, M. 2011. Exploring Second-Language Varieties of English and Learner Englishes: Bridging a Paradigm Gap. Amsterdam: John Benjamins.
Nelson, G., Wallis, S. & Aarts, B. 2002. Exploring Natural Language: Working with the British Component of the International Corpus of English. Varieties of English Around the World: G29. Amsterdam: John Benjamins.
Nesselhauf, N. 2005. Collocations in a Learner Corpus. Amsterdam: John Benjamins.
. 2009. “Co-selection phenomena across New Englishes: Parallels (and differences) to foreign learner varieties”, English World-Wide 30(1), 1–25.
Ng, H.T., Wu, S.M., Briscoe, T., Hadiwinoto, C., Susanto, R.H. & Bryant, C. (Eds.). 2014. Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task. Association for Computational Linguistics, Baltimore, Maryland, June.
Pecina, P. 2009. Lexical Association Measures: Collocation Extraction. Studies in Computational and Theoretical Linguistics. Institute of Formal and Applied Linguistics, Charles University in Prague.
Rosén, V. & Smedt, K.D. 2010. “Syntactic annotation of learner corpora”. In H. Johansen, A. Golden, J.E. Hagen & A.-K. Helland (Eds.), Systematisk, variert, men ikke tilfeldig. Antologi om norsk som andrespråk i anledning Kari Tenfjords 60-årsdag [Systematic, Varied, but not Arbitrary. Anthology about Norwegian as a Second Language on the Occasion of Kari Tenfjord’s 60th Birthday]. Oslo: Novus forlag, 120–132.
Sag, I.A., Baldwin, T., Bond, F., Copestake, A. & Flickinger, D. 2001. Multi-word expressions: A pain in the neck for NLP. Technical Report LinGO Working Paper No. 2001-03, Stanford University, CA.
Salazar, D. 2014. Lexical Bundles in Native and Non-native Scientific Writing. Amsterdam: John Benjamins.
Sand, A. 2004. “Shared morpho-syntactic features in contact varieties of English: Article use”, World Englishes 23(2), 281–98.
Schneider, E.W. 2004. “How to trace structural nativization: Particle verbs in world Englishes”, World Englishes 23(2), 227–249.
Schneider, G. 2008. Hybrid Long-Distance Functional Dependency Parsing. PhD Thesis. Institute of Computational Linguistics, University of Zurich.
Schneider, G. & Hundt, M. 2009. “Using a parser as a heuristic tool for the description of New Englishes”. In Proceedings of
Corpus Linguistics 2009
, Liverpool.
Schneider, G. & Zipp, L. 2013. “Discovering new verb-preposition combinations in New Englishes”, Studies in Variation, Contacts and Change in English 131. Available at: [URL] (accessed April 2016).
Sedlatschek, A. 2009. Contemporary Indian English: Variation and Change. Amsterdam: John Benjamins.
Shannon, C. 1951. “Prediction and entropy of printed English”, The Bell System Technical Journal 301, 50–64.
Tomasello, M. 2003. Constructing a Language: A Usage-based Theory of Language Acquisition. Cambridge, MA: Harvard University Press.
Van Rooy, B. 2011. “A principled distinction between error and conventionalized innovation in African Englishes”. In J. Mukherjee & M. Hundt (Eds.), Exploring Second-Language Varieties of English and Learner Englishes: Bridging the Paradigm Gap. Amsterdam : John Benjamins, 189–207.
Cited by (10)
Cited by ten other publications
Callies, Marcus
Granger, Sylviane & Marie-Aude Lefer
Meriläinen, Lea
Schneider, Gerold, Marianne Hundt & Daniel Schreier
McCallum, Lee
Gilquin, Gaëtanelle
2018. Exploring the spoken learner English constructicon. In Speaking in a Second Language [AILA Applied Linguistics Series, 17], ► pp. 127 ff.
Hoffmann, Sebastian
2018. I would like to request for your attention. In Changing Structures [Studies in Language Companion Series, 195], ► pp. 171 ff.
GUT, ULRIKE & ROBERT FUCHS
Lyashevkaya, Olga & Irina Panteleeva
[no author supplied]
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
