In:Complexity, Accuracy and Fluency in Learner Corpus Research
Edited by Agnieszka Leńko-Szymańska and Sandra Götz
[Studies in Corpus Linguistics 104] 2022
► pp. 159–180
Measuring lexical accuracy
The categorization of lexical errors in written L2 English
Published online: 1 December 2022
https://doi.org/10.1075/scl.104.07hof
https://doi.org/10.1075/scl.104.07hof
Abstract
This study seeks to contribute to the dialogue about the creation and application of lexical error annotation schemes for the purpose of studying learner language. It outlines some of the common issues that the creators of such lexical error annotation schemes face on the theoretical level, as well as the less theoretical issues that annotators encounter in practice. The chapter further illustrates the importance of a clean separation of codes that pertain to error description from those that seek to explain errors. In doing so, the study also highlights the importance of the hierarchical structure of error tag sets and how this (previously under-utilized) concept has implications for the comparability of results across learner corpora and their respective annotation schemes.
Keywords: error annotation, error taxonomies, lexical accuracy, lexical errors
Article outline
- 1.Introduction
- 2.Definitional challenges concerning lexical errors
- 3.Method
- International Corpus of Learner English (ICLE)
- Cambridge Learner Corpus (CLC)
- Teaching Resource Extraction from an Annotated Corpus of Learner English Project (TREACLE)
- 4.Practical issues in error annotation: Decision making
- 4.1Annotation span: Influence on accuracy scores
- 4.2Overlaps between spelling, morphology and punctuation
- 4.3Drawing a line between spelling, word choice and grammatical errors
- 4.4Overlaps between categories for spelling, morphology and grammar
- 4.5Mixing error description and error explanation
- 4.6Addressing structural deviations of learner utterances
- 5.Conclusion
Notes References
References (34)
Agustín Llach, María P. 2011. Lexical Errors and Accuracy in Foreign Language Writing. Bristol: Multilingual Matters.
Alfaifi, Abdullah. 2015. Building the Arabic Learner Corpus and a System for Arabic Error Annotation. PhD dissertation, University of Leeds.
Bley-Vroman, Robert. 1983. The comparative fallacy in interlanguage studies: The case of systematicity. Language Learning 33(1): 1–17.
Bryant, Christopher, Felice, Mariano & Briscoe, Ted. 2017. Automatic annotation and evaluation of error types for grammatical error correction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vol. 1: Long Papers, Regina Barzilay & Min-Yen Kan (eds), 793–805. Vancouver BC: Association for Computational Linguistics. <[URL]> (15 November 2021).
Castillejos López, Willelmira. 2009. Error analysis in a learner corpus: What are the learners’ strategies? In A Survey of Corpus-based Research, Pascual Cantos Gómez & Aquilino Sánchez Pérez (eds), 675–690. Murcia: Asociación Española de Lingüística del Corpus.
Corder, Stephen P. 1974. Error analysis. In Techniques in Applied Linguistics [Language and Language Learning 3], John P. B. Allen & Stephen P. Corder (eds), 122–154. Oxford: OUP.
Council of Europe. 2001. Common European Framework of Reference for Languages: Learning, teaching, assessment. Cambridge: CUP.
Dagneaux, Estelle, Denness, Sharon & Granger, Sylviane. 1998. Computer-aided error analysis. System 26(2): 163–174.
del Rio, Iria & Mendes, Amália. 2018. Error annotation in the COPLE2 corpus. Revista Da Associação Portuguesa De Linguística 4: 225–239.
Díaz-Negrillo, Ana & Fernandez-Dominguez, Jesus. 2006. Error tagging systems for learner corpora. RESLA, Spanish Journal of Applied Linguistics 19: 83–102.
Di Nuovo, Elisa, Bosco, Cristina, Mazzei, Alessandro & Sanguinetti, Manuela. 2019. Towards an Italian learner treebank in Universal Dependencies. In Proceedings of the Sixth Italian Conference on Computational Linguistics, Bari, Italy, November 13–15, 2019 [CEUR Workshop Proceedings 2481], Raffaella Bernardi, Roberto Navigli & Giovanni Semeraro (eds). [URL]. <[URL]> (15 November 2021).
Granger, Sylviane. 1998. The computer learner corpus: A versatile new source of data for SLA-research. In Learner English on Computer, Sylviane Granger (ed.), 3–18. London: Longman.
Granger, Sylviane, Dagneaux, Estelle, Meunier, Fanny & Paquot, Magali. 2009. International Corpus of Learner English. Louvain-la-Neuve: Presses universitaires de Louvain.
Hawkins, John A. & Buttery, Paula. 2010. Criterial features in learner corpora: Theory and illustrations. English Profile Journal 1: e5.
Housen, Alex & Kuiken, Folkert. 2009. Complexity, accuracy, and fluency in Second Language Acquisition. Applied Linguistics 30(4): 461–473.
Izumi, Emi, Uchimoto, Kiyotaka & Isahara, Hitoshi. 2005. Error annotation for corpus of Japanese learner English. Proceedings of the Sixth International Workshop on Linguistically Interpreted Corpora ({LINC}-2005). <[URL]> (4 August 2021).
Kreyer, Rolf. 2015. The Marburg Corpus of Intermediate Learner English (MILE). In Learner Corpora in Language Testing and Assessment [Studies in Corpus Linguistics 70], Marcus Callies & Sandra Götz (eds), 13–34. Amsterdam: John Benjamins.
Larsson, Tove, Paquot, Magali & Plonsky, Luke. 2020. Inter-rater reliability in Learner Corpus Research: Insights from a collaborative study on adverb placement. International Journal of Learner Corpus Research 6(2): 237–251.
Leclercq, Pascale & Edmonds, Amanda. 2014. How to assess L2 proficiency? An overview of proficiency assessment research. In Measuring L2 Proficiency: Perspectives from SLA [Second Language Acquisition 78], Pascale Leclercq, Amanda Edmonds & Heather Hilton (eds), 3–23. Bristol: Multilingual Matters.
Lüdeling, Anke. 2008. Mehrdeutigkeiten und Kategorisierung: Probleme bei der Annotation von Lernerkorpora. In Fortgeschrittenene Lernervarietäten, Maik Walter & Patrick Grommes (eds), 119–140. Tübingen: Niemeyer.
Lüdeling, Anke & Hirschmann, Hagen. 2015. Error annotation systems. In The Cambridge Handbook of Learner Corpus Research [Cambridge Handbooks in Language and Linguistics], Sylviane Granger, Gaëtanelle Gilquin & Fanny Meunier (eds), 135–158. Cambridge: CUP.
Lüdeling, Anke, Walter, Maik, Kroymann, Emil & Adolphs, Peter. 2005. Multi-level error annotation in learner corpora. In Proceedings of the Corpus Linguistics Conference, Sanda Hunston & Pernilla Danielsson (eds). Birmingham: University of Birmingham. <[URL]> (15 November 2021).
MacDonald, Penny, Murcia, Susana, Boquera, Maria, Botella, Ana, Cardona, Laura, García, Rebeca, Mediero, Esther, O’Donnell, Mick, Robles, Ainhoa & Stuart, Keith. 2011. Error coding in the TREACLE project. In Actas del III congreso internacional de lingüística de corpus: Las tecnologías de la informacíon y las comunicaciones: presente y futuro en el análisis de corpus, María L. Carrío Pastor & Miguel Á. Candel Mora (eds), 725–740. València: Universitat Politècnica.
Nicholls, Diane. 2003. The Cambridge Learner Corpus – Error coding and analysis for lexicography and ELT. In Proceedings of the Corpus Linguistics 2003 Conference, Dawn Archer, Paul Rayson, Andrew Wilson & Tony McEnery (eds), 572–581. Lancaster: University of Lancaster.
O’Donnell, Mick. 2007. The UAM CorpusTool. <[URL]> (4 August 2021).
Reznicek, Marc, Lüdeling, Anke & Hirschmann, Hagen. 2013. Competing target hypotheses in the Falko Corpus: A flexible multi-layer corpus architecture. In Automatic Treatment and Analysis of Learner Corpus Data [Studies in Corpus Linguistics 59], Ana Díaz-Negrillo, Nicolas Ballier & Paul Thompson (eds), 101–123. Amsterdam: John Benjamins.
Reznicek, Marc, Lüdeling, Anke, Krummes, Cedric, Schwantuschke, Franziska, Walter, Maik, Schmidt, Karin, Hirschmann, Hagen & Andreas, Torsten. 2012. Das Falko-Handbuch: Korpusaufbau und Annotationen. Version 2.01. <[URL]> (3 August 2021).
Rozovskaya, Alla & Roth, Dan. 2010. Annotating ESL errors: Challenges and rewards. In Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications, Joel Tetreault, Jill Burstein & Claudia Leacock (eds), 28–36. Los Angeles CA: Association for Computational Linguistics.
