In:Learner Corpora in Language Testing and Assessment
Edited by Marcus Callies and Sandra Götz
[Studies in Corpus Linguistics 70] 2015
► pp. 85–112
First steps in assigning proficiency to texts in a learner corpus of computer-mediated communication
Published online: 9 April 2015
https://doi.org/10.1075/scl.70.04mar
https://doi.org/10.1075/scl.70.04mar
This chapter presents a new method for assigning proficiency levels to texts in a
learner corpus of computer-mediated communication (CMC). The CMC comes
from learner comments on news articles that form part of an English language
course for university students in Japan. The rationale for using the CMC discourse
as the basis of a learner corpus will be discussed, followed by a justification
of using a text-centred approach of assigning proficiency. The use of binary
decision trees to account for the complexity, accuracy and fluency evident in
the texts will be described, followed by a snapshot of the results from using the
method so far. The chapter concludes with the suggestion that while some of the
details may need refining, in principle the method could be of use in categorizing
the proficiency of texts in other learner corpora.
References (45)
British Broadcasting Corporation (BBC). 2001–2014. Have Your Say, <[URL]> (5 July 2014).
Burt, M.K. & Kiparsky, C. 1972. The Gooficon: A Repair Manual for English. Rowley MA: Newbury House.
Callies, M. 2013. Advancing the research agenda of Interlanguage Pragmatics: The role of learner corpora. In Yearbook of Corpus Linguistics and Pragmatics 2013: New Domains and Methodologies, J. Romero-Trillo (ed.), 9–36. New York NY: Springer.
Carlsen, C. 2012. Proficiency level: A fuzzy variable in computer learner corpora. Applied Linguistics 33(2): 161–183.
Cobb, T. 2014. Web Vocabprofile. An adaptation of Heatley, Nation & Coxhead’s (2002) Range
, <[URL]> (5 July 2014).
Cohen, J. 1968. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin 70(4): 213.
Council of Europe. 2001. The Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge: CUP.
Davies, M. 2008. The Corpus of Contemporary American English: 425 million words, 1990–present, <[URL]> (5 July 2014).
Du, H.S. & Wagner, C. 2007. Learning with weblogs: Enhancing cognitive and social knowledge construction. IEEE Transactions of Professional Communication 50(1): 1–16.
Erbaggio, P., Gopalakrishnan, S., Hobbs, S. & Liu, H. 2012. Enhancing student engagement through online authentic materials. International Association for Language Learning Technology 42(2): 27–51.
Fulcher, G., Davidson, F. & Kemp, J. 2011. Effective rating scale development for speaking tests: Performance decision trees. Language Testing 28(1): 5–29.
Granger, S. 2009. The contribution of learner corpora to second language acquisition and foreign language teaching: A critical evaluation. In Corpora and Language Teaching [Studies in Corpus Linguistics 32], K. Aijmer (ed.), 13–32. Amsterdam: John Benjamins.
Heatley, A., Nation, P. & Coxhead, A. 2002. RANGE and FREQUENCY programs, <[URL]> (5 July 2014).
Hillocks Jr, G. 1986. Research on Written Composition: New Directions for Teaching. Urbana: ERIC Clearinghouse on Reading and Communication Skills.
Housen, A. & Kuiken, F. 2009. Complexity, accuracy and fluency in second language acquisition. Applied Linguistics 30(4): 461–473.
Hsu, C.-L. & Lin, J.C.-C. 2008. Acceptance of blog usage: The roles of technology acceptance, social influence and knowledge sharing motivation. Information & Management 45: 65–74.
Hulstijn, J.H. 2010. Linking L2 proficiency to L2 acquisition: Opportunities and challenges of profiling research. In Communicative Proficiency and Linguistic Development: Intersections between SLA and Language Testing Research, I. Bartning, M. Martin & I. Vedder (eds), 233–238. EUROSLA Monographs Series 1.
Jarvis, S. & Pavlenko, A. 2008. Crosslinguistic Influence in Language and Cognition. New York NY: Routledge.
Long, M.H. 1996. The role of the linguistic environment in second language acquisition. In Handbook of Second Language Acquisition, W.C. Ritchie & T.K. Bhatia (eds), 413–468. San Diego CA: Academic Press.
Marchand, T. 2010–2014. News Based English, <[URL]> (5 July 2014).
. 2013. Speech in written form? A corpus analysis of computer-mediated communication. Linguistic Research 30(2): 217–242.
Marchand, T. & Akutsu, S. Forthcoming. The compilation and use of a CMC learner corpus for Japanese university students. In Studies in Learner Corpus Linguistics: Research and Applications for Foreign Language Teaching and Assessment, E. Castello, K. Ackerley & F. Coccetta (eds). Frankfurt: Peter Lang.
Marchand, T. & Rowlett, B. 2013. Course design in the digital age: Learning through interaction with news-based materials. Language Education in Asia 4(2): 183–198.
Meunier, F. 2010. Learner corpora and English language teaching: Checkup time. Anglistik: International Journal of English Studies 21(1): 209–220.
Mizrahi, E. & Laufer, B. 2010. Lexical competence of highly advanced L2 users: Is their collocation knowledge as good as their productive vocabulary size? Paper presented at EUROSLA 20.
Multon, K. 2010. Interrater reliability. In Encyclopedia of Research Design, N. Salkind (ed.), 627–629. Thousand Oaks CA: Sage.
Nishina, Y. 2007. A corpus-driven approach to genre analysis: The reinvestigation of academic, newspaper and literary texts. ELR Journal 1(2), <[URL]> (5 July 2014).
Norris, J.M. & Ortega, L. 2009. Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied Linguistics 30(4): 510–532.
Ortega, L. 2003. Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. Applied Linguistics 24(3): 492–518.
Pallotti, G. 2009. CAF: Defining, refining and differentiating constructs. Applied Linguistics 30(4): 590–601.
. 2009. Modelling second language performance: Integrating complexity, accuracy, fluency, and lexis. Applied Linguistics 30(4): 555–578.
Skehan, P. & Foster, P. 1999. The influence of task structure and processing conditions on narrative retellings. Language Learning 49(1): 93–120.
Smith, B. 2004. Computer-mediated negotiated interaction and lexical acquisition. Studies in Second Language Acquisition 26(3): 365–398.
Sun, Y.-C. 2009. Voice blog: An exploratory study of language learning. Language Learning & Technology 13(2): 88–103.
Thewissen, J. 2013. Capturing L2 accuracy developmental patterns: Insights from an error-tagged EFL learner corpus. The Modern Language Journal 97(S1): 77–101.
Upshur, J.A. & Turner, C.E. 1995. Constructing rating scales for second language tests. English Language Teaching Journal 49(1): 3–12.
. 1999. Systematic effects in the rating of second-language speaking ability: Test method and learner discourse. Language Testing 16(1): 82–111.
Washington Post. 2013. American teacher in Japan under fire for lesson’s on Japan’s history of discrimination, [URL]> (13 October 2013).
Cited by (4)
Cited by four other publications
Granger, Sylviane
2024. From early to future learner corpus research. International Journal of Learner Corpus Research 10:2 ► pp. 247 ff.
Götz, Sandra & Sylviane Granger
2024. Learner corpus research for pedagogical purposes. International Journal of Learner Corpus Research 10:1 ► pp. 1 ff.
Penha-Marion, Laura, Gaëtanelle Gilquin & Marie-Aude Lefer
2024. The effect of directionality on lexico‑syntactic simplification in French><English student translation. In Constraints on Language Variation and Change in Complex Multilingual Contact Settings [Contact Language Library, 60], ► pp. 153 ff.
This list is based on CrossRef data as of 1 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
