Article published In: International Journal of Learner Corpus Research
Vol. 6:2 (2020) ► pp.237–251
Materials & Methods Report
Inter-rater reliability in Learner Corpus Research
Insights from a collaborative study on adverb placement
Available under the Creative Commons Attribution-NonCommercial (CC BY-NC) 4.0 license.
For any use beyond this license, please contact the publisher at rights@benjamins.nl.
Published online: 10 December 2020
https://doi.org/10.1075/ijlcr.20001.lar
https://doi.org/10.1075/ijlcr.20001.lar
Abstract
In Learner Corpus Research (LCR), a common source of errors stems from manual coding and annotation of linguistic
features. To estimate the amount of error present in a coded dataset, coefficients of inter-rater reliability are used. However, despite
the importance of reliability and internal consistency for validity and, by extension, study quality, interpretability and generalizability,
it is surprisingly uncommon for studies in the field of LCR to report on such reliability coefficients. In this Methods Report, we use a
recent collaborative research project to illustrate the pertinence of considering inter-rater reliability. In doing so, we hope to initiate
methodological discussion on instrument design, piloting and evaluation. We also suggest some ways forward to encourage increased
transparency in reporting practices.
Article outline
- 1.Introduction
- 2.Working towards increased reliability in a study on adverb placement
- 2.1The coding scheme
- 2.2Piloting the coding scheme and estimating inter-rater reliability
- 2.3Revising the coding scheme
- 2.4From a single-coder to a double-coder approach
- 3.Conclusion and ways forward
- Note
- Acknowledgements
- Notes
References
References (39)
Andreu-Andrés, M., Astor-Guardiola, A., Boquera-Matarredona, M., Macdonald, P., Montero-Fleta, B., & Pérez-Sabater, C. (2010). Analysing EFL learner output in the MiLC project: An error it’s*, but which tag?. In M. C. Campoy-Cubillo, B. Bellés-Fortuño, & M. Ll. Gea-Valor (Eds.), Corpus-based approaches to English language teaching (pp. 167–188). London: Continuum.
Artstein, R. (2017). Inter-annotator agreement. In N. Ide & J. Pustejovsky (Eds.), Handbook of linguistic annotation (pp. 297–313). New York, NY: Springer.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 201, 37–46.
Collentine, K. (2009). Learner use of holistic language units in task-based synchronous computer-mediated communication. Language Learning & Technology, 131, 67–87.
Derrick, D. (2015). Instrument reporting practices in second language research. TESOL Quarterly, 50(1), 132–153.
Díez-Bedmar, M. B. (2015). Dealing with errors in learner corpora to describe, teach and assess EFL writing: Focus on article use. In E. Castello, K. Ackerley, & F. Coccetta (Eds.), Studies in Learner Corpus Linguistics: Research and applications for foreign language teaching and assessment (pp. 37–69). Bern: Peter Lang.
Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382.
Gamer, M., Lemon, J., Fellows, I., & Singh, P. (2012).
irr: Various coefficients of interrater reliability and agreement. R package version 0.84.
Hallgren, K. (2012). Computing inter-rater reliability for observational data: An overview and tutorial. Tutorials in Quantitative Methods for Psychology, 8(1), 23–34.
Johnson, R. L., Penny, J., & Gordon, B. (2010). The relation between score resolution methods and interrater reliability: An empirical study of an analytic scoring rubric. Applied Measurement in Education, 13(2), 121–138.
Kutuk, G., Putwain, D. W., Kaye, L., & Garrett, B. (in press). Development and validation of a new multidimensional language class anxiety scale. Journal of Psychoeducational Assessment.
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 331, 159–174.
Larsson, T. (2018). Is there a correlation between form and function? A syntactic and functional investigation of the introductory it pattern in student writing. ICAME Journal, 42(1), 13–40.
Larsson, T., Callies, M., Hasselgård, H., Laso, N. J., Van Vuuren, S., Verdaguer, I., & Paquot, M. (2020). Adverb placement in EFL academic writing: Going beyond syntactic transfer. International Journal of Corpus Linguistics, 25(2), 155–184.
Larson-Hall, J., & Plonsky, L. (2015). Reporting and interpreting quantitative research findings: What gets reported and recommendations for the field. Language Learning, 65(Suppl. 1), 127–159.
Loewen, S., & Plonsky, L. (2015). An A–Z of applied linguistics research methods. New York, NY: Palgrave.
Lu, X. (2010). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4), 474–496.
Lüdeling, A., & Hirschmann, H. (2015). Error annotation systems. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 135–157). Cambridge: Cambridge University Press.
McKay, T., & Plonsky, L. (in press). Reliability analyses: Estimating error in L2 research. In P. Winke & T. Brunfaut (Eds.), The Routledge handbook of second language acquisition and language testing. New York, NY: Routledge.
Morgan, G. B., Zhu, M., Johnson, R. L., & Hodge, K. J. (2014). Interrater reliability estimators commonly used in scoring language assessments: A Monte Carlo investigation of estimator accuracy. Language Assessment Quarterly, 111, 304–324.
Norris, J. M., Plonsky, L., Ross, S. J., & Schoonen, R. (2015). Guidelines for reporting quantitative methods and results in primary research. Language Learning, 65(2), 470–476.
Osborne, J. (2003). Effect sizes and the disattenuation of correlation and regression coefficients: Lessons from educational psychology. Practical Assessment, Research, & Evaluation, 8(11). Retrieved from [URL]
Paquot, M., Hasselgård, H., & Oksefjell Ebeling, S. (2013). Writer/reader visibility in learner writing across genres: A comparison of the French and Norwegian components of the ICLE and VESPA learner corpora. In S. Granger, G. Gilquin, & F. Meunier (Eds.), Twenty years of Learner Corpus Research: Looking back, moving ahead. Proceedings of the first Learner Corpus Research Conference (LCR 2011) (pp. 377–387). Louvain-la-Neuve: Presses Universitaires de Louvain.
Paquot, M., Grafmiller, J., & Szmrecsanyi, B. (2019). Particle placement alternation in EFL learner vs. L1 speech: Assessing the similarity of probabilistic grammars. In A. Abel, A. Glaznieks, V. Lyding, & L. Nicolas (Eds.), Widening the scope of learner corpus research: Selected papers from the fourth Learner Corpus Research Conference (pp. 71–92). Louvain-la-Neuve: Presses universitaires de Louvain.
Paquot, M., & Plonsky, L. (2017). Quantitative research methods and study quality in learner corpus research. International Journal of Learner Corpus Research, 3(1), 61–94.
Plonsky, L. (2013). Study quality in SLA: An assessment of designs, analyses, and reporting practices in quantitative L2 research. Studies in Second Language Acquisition, 351, 655–687.
Plonsky, L., & Derrick, D. J. (2016). A meta-analysis of reliability coefficients in second language research. Modern Language Journal, 1001, 538–553.
Polio, C., & Shea, M. (2014). An investigation into current measures of linguistic accuracy in second language writing research. Journal of Second Language Writing, 26(1), 10–27.
Purpura, J., Brown, J. D., & Schoonen, R. (2015). Improving the validity of quantitative measures in applied linguistics research. Language Learning, 65(Suppl. 1), 37–75.
Quirk, R., Greenbaum, S., Leech, G., & Svartvik, J. (1985). A comprehensive grammar of the English language. London: Longman.
R Core Team. (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Retrieved from [URL]
Révész, A. (2012). Coding second language data validly and reliably. In A. Mackey & S. Gass (Eds.), Research methods in Second Language Acquisition: A practical guide (pp. 203–221). Hoboken, NJ: Wiley-Blackwell.
Rose, Y., & MacWhinney, B. (2014). The PhonBank Project: Data and software-assisted methods for the study of phonology and phonological development. In J. Durand, U. Gut, & G. Kristoffersen (Eds.), The Oxford handbook of corpus phonology (pp. 380–401). Oxford: Oxford University Press.
Rosen, A., Hana, J., Stindlova, B., & Feldman, A. (2014). Evaluating and automating the annotation of a learner corpus. Language Resources and Evaluation, 481, 65–92.
Sim, J., & Wright, C. C. (2005). The Kappa statistic in reliability studies: Use, interpretation, and sample size requirements. Physical Therapy, 85(3), 257–268.
Spooren, W., & Degand, L. (2010). Coding coherence relations: Reliability and validity. Corpus Linguistics and Linguistic Theory, 6(2), 241–266.
Trafimow, D. (2017). The attenuation of correlation coefficients: A statistical literacy issue. Teaching Statistics, 381, 25–28.
Vyatkina, N. (2016). KANDEL: A developmental corpus of learner German. International Journal of Learner Corpus Research, 2(1), 102–120.
Cited by (17)
Cited by 17 other publications
Larsson, Tove, Marcus Callies, Tülay Dixon, Hilde Hasselgård, Nicole Hober, Natalia Judith Laso, Sanne van Vuuren, Isabel Verdaguer & Magali Paquot
2025. Adverb placement in L1 and L2 spoken production. International Journal of Corpus Linguistics 30:1 ► pp. 79 ff.
Song, Yingming & Jiajin Xu
Chong, Sin Wang & Luke Plonsky
Demir, Nur Yağmur, Ryan Bartholomew & Tove Larsson
Kim, Minjin, Xixin Qiu & Yuanheng (Arthur) Wang
Listanti, Andrea & Jacopo Torregrossa
Minnillo, Sophia, Claudia Sánchez-Gutiérrez, Ana Ruiz-Alonso-Bartol, Emily Morgan & Carmen González Gómez
2024. Predictors of accuracy in L2 Spanish preterit-imperfect production. International Journal of Learner Corpus Research 10:2 ► pp. 301 ff.
Paquot, Magali
Rosemeyer, Malte
Hober, Nicole, Tülay Dixon & Tove Larsson
Love, Robbie & Anna-Brita Stenstrom
Rygg, Kristin & Stine Hulleberg Johansen
Hoffmann, Tim
2022. Measuring lexical accuracy. In Complexity, Accuracy and Fluency in Learner Corpus Research [Studies in Corpus Linguistics, 104], ► pp. 159 ff.
Kim, YouJin & Laura Gurzynski-Weiss
2022. Contributing to the advancement of the field:. In Research methods in instructed second language acquisition [Research Methods in Applied Linguistics, 3], ► pp. 355 ff.
Larsson, Tove, Randi Reppen & Tülay Dixon
Vetchinnikova, Svetlana, Alena Konina, Nitin Williams, Nina Mikušová & Anna Mauranen
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
