In:Challenges in Corpus Linguistics: Rethinking corpus compilation and analysis
Edited by Mark Kaunisto and Marco Schilk
[Studies in Corpus Linguistics 118] 2024
► pp. 55–67
Challenges in the compilation, annotation, and analysis of learner corpus data
Published online: 19 September 2024
https://doi.org/10.1075/scl.118.04cal
https://doi.org/10.1075/scl.118.04cal
Abstract
This chapter highlights and discusses the special
characteristics of learner corpus data and the challenges they may present
for corpus compilation, annotation, and analysis. Because learner corpus and
SLA researchers use their data to study L2 production and development, it is
of utmost importance that the data are valid, that is, they represent
“authentic” L2 production, which means that the data must stem from the
studied learners’ own language production. I discuss challenges
in three areas: (1) multilingual practices and metalinguistic language use,
(2) lexical and constructional bias, often brought about by the wording of
task instructions or writing prompts that learners are asked to respond to,
and (3) learner corpus annotation in view of the “discourse of deficit” in
SLA. For each of these challenges solutions as to how they can be met are
offered.
Article outline
- 1.Introduction and general remarks
- 2.Challenges and how to respond to them
- 2.1Multilingual practices and metalinguistic language use
- Response
- 2.2Task effects
- Response
- 2.3“Discourse of deficit” and learner corpus annotation
- Response
- 2.1Multilingual practices and metalinguistic language use
- 3.Summary and conclusion
Notes References
References (26)
Alexopoulou, Theodora, Geertzen, Jeroen, Korhonen, Anna & Meurers, Detmar. 2015. Exploring
big educational learner corpora for SLA research: Perspectives on
relative clauses. International
Journal of Learner Corpus
Research 1(1): 96–129.
Callies, Marcus. 2008. Easy
to understand but difficult to use? Raising constructions and
information packaging in the advanced learner
variety. In Linking
Contrastive and Learner Corpus
Research [Language and Computers, Studies in
Practical Linguistics Series 66], Gaëtanelle Gilquin, Szilvia Papp & María Belén Díez-Bedmar (eds), 201–226. Amsterdam: Rodopi.
. 2016. Towards
a process-oriented approach to comparing EFL and ESL varieties: A
corpus-study of lexical
innovations. International Journal of
Learner Corpus
Research 2(2): 229–250.
. 2023. Errors
and innovations in L2 varieties of English: Towards resolving a
contradictory
practice. In Contradiction
Studies – Exploring the Field, Gisela Febel, Kerstin Knopf & Martin Nonhoff (eds), 201–214. New York NY: Springer.
Callies, Marcus & Wiemeyer, Leonie. 2017. Multilingual
speakers, multilingual texts: Multilingual practices in learner
corpora. In Challenging
the Myth of Monolingual Corpora [Language
and Computers 80], Arja Nurmi, Tanja Rütten & Päivi Pahta (eds), 80–94. Leiden: Brill.
Callies, Marcus & Hehner, Stefanie. 2021. Konstruktionen
mit Partikelverben in Varietäten des Englischen: Zum Spannungsfeld
von Präskription und Innovation an der Schnittstelle von
Sprachwissenschaft, Fremdsprachendidaktik und
Unterrichtspraxis. In Sprachwissenschaft
und Fremdsprachendidaktik: Konstruktionen und
Konstruktionslernen, Christoph Bürgel, Paul Gévaudan & Dirk Siepmann (eds), 81–93. Tübingen: Stauffenburg.
Gilquin, Gaëtanelle. 2015. From
design to collection of learner
corpora. In Cambridge
Handbook of Learner Corpus Research, Sylviane Granger, Fanny Meunier & Gaëtanelle Gilquin (eds), 9–34. Cambridge: CUP.
. 2022. The
Process Corpus of English in Education: Going beyond the written
text. Research in Corpus
Linguistics 10(1): 31–44.
Gilquin, Gaëtanelle, De Cock, Sylvie & Granger, Sylviane (eds). 2010. The
Louvain International Database of Spoken English
Interlanguage. Handbook and
CD-ROM. Louvain-la-Neuve: Presses universitaires de Louvain.
Gilquin, Gaëtanelle & Laporte, Samantha. 2021. The
use of online writing tools by learners of English: Evidence from a
process corpus. International Journal
of
Lexicography 34(4): 472–492.
Granger, Sylviane, Meunier, Fanny & Gilquin, Gaëtanelle (eds). 2015. Cambridge
Handbook of Learner Corpus
Research. Cambridge: CUP.
Hartwell, Kelly & Aull, Laura. 2023. Editorial
introduction: AI, corpora, and future directions for writing
assessment. Assessing
Writing 57: 1–4.
Ishikiwa, Shin’ichiro. 2023. The
ICNALE Guide An Introduction to a Learner Corpus Study on Asian
Learners’ L2 English. New York NY: Routledge.
Kaszubski, Przemysław. 2005. Typical
errors of Polish advanced EFL learner
writers. <[URL]> (currently
not accessible).
König, Alexander, Frey, Jennifer-Carmen, Stemle, Egon W., Glaznieks, Aivars & Paquot, Magali. 2022. Towards
standardizing LCR metadata. Paper
presented at the 6th
International Conference for Learner Corpus
Research (LCR
2022), Padova, 22–24
September.
Lozano, Cristóbal & Mendikoetxea, Amaya. 2010. Interface
conditions on postverbal subjects: A corpus study of L2
English. Bilingualism: Language and
Cognition 13(4): 475–497.
Lüdeling, Anke & Hirschmann, Hagen. 2015. Error
annotation
systems. In The
Cambridge Handbook of Learner Corpus
Research, Sylviane Granger, Gaëtanelle Gilquin & Fanny Meunier (eds), 135–158. Cambridge: CUP.
Nelson, Gerald. 2002. International
Corpus of English. Markup Manual for Written
Texts. <[URL]> (15 March
2024).
O’Donnell, Matthew Brook, Römer, Ute & Ellis, Nick C. 2013. The
development of formulaic language in first and second language
writing: Investigating effects of frequency, association, and native
norm. International Journal of Corpus
Linguistics 18: 83–108.
Ortega, Lourdes. 2013. SLA
for the 21st century: Disciplinary progress, transdisciplinary
relevance, and the bi/multilingual
turn. Language
Learning 63(1): 1–24.
Paquot, Magali & Callies, Marcus. 2020. Promoting
methodological expertise, transparency, replication, and cumulative
learning: Introducing new manuscript types in the
International Journal of Learner Corpus
Research. International
Journal of Learner Corpus
Research 6(2): 121–124.
Sinclair, John McH. 2005. Corpus
and text – Basic
principles. In Developing
Linguistic Corpora: A Guide to Good
Practice, Martin Wynne (ed.), 1–16. Oxford: Oxbow Books.
SPLLOC Transcription
Guidelines 2008. <[URL]> (1 August
2022).
Tracy-Ventura, Nicole & Paquot, Magali (eds). 2021. The
Routledge Handbook of Second Language Acquisition and
Corpora. London: Routledge.
Wiemeyer, Leonie. 2022. Intertextuality
in Foreign-language Academic Writing in English. A Mixed-methods
Study of University Students’ Writing Products and Processes in
Source-based Disciplinary
Assignments. PhD
dissertation, University of Bremen.
