Article published In: International Journal of Learner Corpus Research
Vol. 8:2 (2022) ► pp.261–282
Corpus report
A multilingual learner corpus for less commonly taught languages
Published online: 26 January 2023
https://doi.org/10.1075/ijlcr.21001.som
https://doi.org/10.1075/ijlcr.21001.som
Abstract
This article provides a detailed account of the framework, pedagogical and research applications of the Multilingual Academic Corpus of Assignments – Writing and Speech (MACAWS). MACAWS is a monitor learner corpus of written and oral assignments produced by foreign language learners in the context of their language learning classrooms. Currently the corpus focuses on two less commonly taught languages rarely represented in learner corpora, Portuguese and Russian, and contains 124,054 words in Russian and 536,168 in Portuguese, being updated each semester as new texts are added to the corpus. The online interface is designed for ease of use by teachers and students. Our novel interactive data-driven learning (iDDL) tool allows embedding of concordance lines into websites and learning management systems (LMS), facilitating student interaction with concordance lines. Researchers can gain access to an offline corpus for greater flexibility.
Article outline
- 1.Introduction: Background and motivation
- 2.Data collection
- 2.1Context of foreign language programs
- 2.2Metadata: Course, assignment and learners
- 3.Corpus building
- 3.1Processing and transcription
- 3.2De-identification of texts
- 3.3Corpus organization: Assignment, topic and macrogenre
- 4.Current corpus
- 4.1Corpus statistics
- 4.2Corpus interface
- 4.3Interactive data-driven learning (iDDL)
- 5.Research and pedagogical applications
- 6.Limitations
- 7.Conclusion
- 8.Future directions
- Notes
References
References (42)
Ädel, A., & Erman, B. (2012). Recurrent word combinations in academic writing by native and non-native speakers of English: A lexical bundles approach. English for Specific Purposes, 31(2), 81–92.
Chronicle of Higher Education Staff (2019, January 29). Which colleges grant the most degrees in foreign languages? The Chronicle of Higher Education. [URL]
Bell, P., & Payant, C. (2020). Designing learner corpora, collection, transcription, and annotation. In N. Tracy-Ventura, & M. Paquot (Eds.), The Routledge Handbook of Second Language Acquisition and Corpora (pp. 53–67). Routledge.
Bertho, M., Novikov, A., Picoral, A., Sommer-Farias, B., & Staples, S. (2020). Taking Flight with MACAWS: Learner corpora from and into the classroom (Webinar for Center for Educational Resources in Culture Language and Literacy) [Video]. Youtube. [URL]
Biber, D., Reppen, R., Staples, S., & Egbert, J. (2020). Exploring the longitudinal development of grammatical complexity in the disciplinary writing of L2-English university students. International Journal of Learner Corpus Research, 6(1), 38–71.
Chen, Y. H., & Baker, P. (2016). Investigating criterial discourse features across second language development: Lexical bundles in rated learner essays, CEFR B1, B2 and C1. Applied Linguistics, 37(6), 849–880.
Cheng, W., Greaves, C., & Warren, M. (2008). A Corpus-driven Study of Discourse Intonation: The Hong Kong Corpus of Spoken English (Prosodic). John Benjamins Publishing.
Davies, M. (2010). The Corpus of Contemporary American English as the first reliable monitor corpus of English. Literary and Linguistic Computing, 25(4), 447–464.
Dutra, D. P., Orfano, B., & Sardinha, T. B. (2014). Stance bundles in learner corpora. In S. Aluisio, & S. Tagnin (Eds.), New language technologies and linguistic research: A two-way road (pp. 2–15). Cambridge Scholars Publishing.
Egbert, J. (2019). Corpus design and representativeness. In J. Egbert, T. Berber Sardinha, & M. Veirano Pinto (Eds.), Multi-dimensional analysis: Research methods and current issues (pp. 27–42). Bloomsbury Academic.
Forsyth, H. (2014). The influence of L2 transfer on L3 English written production in a bilingual German/Italian population: A study of syntactic errors. Open Journal of Modern Linguistics, 4(3), 429–456.
Gao, J., Picoral, A., Staples, S., & MacDonald, L. (2021). Citation practices of L2 writers in first-year writing courses: Form, rhetorical function, and connection with pedagogical materials. Applied Corpus Linguistics, 1(2), 100005.
Gardner, S., & Nesi, H. (2013). A classification of genre families in university student writing. Applied Linguistics, 34(1), 25–52.
Ghanem, R., Edalatishams, I., Huensch, A., Puga, K., & Staples, S. (2020). The effectiveness of digital tools in the analysis of spoken discourse: Towards a protocol for pronunciation corpora. In O. Kang, S. Staples, K. Yaw, & K. Hirschi (Eds.), Proceedings of the 11th Pronunciation in Second Language Learning and Teaching Conference (Northern Arizona University, September 2019) (pp. 97–114). Iowa State University.
Granger, S. (2002). A bird’s-eye view of learner corpus research. In S. Granger, J. Hung, & S. Petch-Tyson (Eds.), Computer learner corpora, second language acquisition and foreign language teaching (pp. 3–33). John Benjamins Publishing.
Granger, S., Gilquin, G., & Meunier, F. (2015). Introduction: Learner corpus research–past, present and future. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research (pp. 1–5). Cambridge University Press.
Hyland, K. (2007). Genre pedagogy: Language, literacy and L2 writing instruction. Journal of Second Language Writing, 16(3), 148–164.
Jouët-Pastré, C., Klobucka, A., Sobral, P., Moreira, M., & Hutchinson, A. (2014). Ponto de encontro: Portuguese as a world language. Pearson Education Limited.
Kagan, O., Kudyma, A., & Miller, F. (2016). V puti: Russian grammar in context. Pearson Prentice Hall.
Kudyma, A., Miller, F., & Kagan, O. (2017). Beginner’s Russian: With interactive online workbook: A basic Russian course. Hippocrene Books.
Kwon, M. H., Partridge, R. S., & Staples, S. (2018). Building a local learner corpus: Construction of a first-year ESL writing corpus for research, teaching, mentoring, and collaboration. International Journal of Learner Corpus Research, 4(1), 112–127.
Long, M. H., Gor, K., & Jackson, S. (2012). Linguistic correlates of second language proficiency: Proof of concept with ILR 2–3 in Russian. Studies in Second Language Acquisition, 34(1), 99–126.
Lorimer Leonard, R., & Shapiro, S. (Eds.). (2023). Critical Language Awareness: A Lens for Looking Backward, Outward, and Forward in L2 Writing [Special issue]. Journal of Second Language Writing.
Martins, C., Ferreira, T., Sitoe, M., Abrantes, C., Janssen, M., Fernandes, A., Silva, A., Lopes, I., Pereira, I., & Santos, J. (2019). Corpus de produções escritas de aprendentes de PL2 (PEAPL2): Subcorpus Português língua estrangeira [Corpus of written productions of PL2 learners (PEAPL2): Portuguese subcorpus as a foreign language]. CELGA-ILTEC.
Mendes, A., Antunes, S., Janssen, M., & Gonçalves, A. (2016). The COPLE2 corpus: a learner corpus for Portuguese. In N. Calzolari, K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) (pp. 3207–3214). European Language Resources Association (ELRA).
Milleret, M. (2016). Introduction: Portuguese instruction in the U.S. In M. Milleret, & M. Risner (Eds.), A Handbook for Portuguese Instructors in the U.S (pp. 11–17). Boavista Press.
Novikov, A. (2021). Syntactic and morphological complexity measures as markers of L2 development in Russian (Unpublished doctoral dissertation). The University of Arizona.
Pereira Soares, S. M., & Rothman, J. (2021). Cognitive states in third language acquisition and beyond: Theoretical and methodological paths forward. Linguistic Approaches to Bilingualism, 11(1), 89–95.
Picoral, A. (2020). L3 Portuguese by Spanish-English bilinguals: Copula construction use and acquisition in corpus data (Unpublished doctoral dissertation). The University of Arizona.
Rakhilina, E., Vyrenkova, A., Mustakimova, E., Ladygina, A., & Smirnov, I. (2016). Building a learner corpus for Russian. In E. Volodina, G. Grigonytė, I. Pilán, K. Nilsson Björkenstam, & L. Borin (Eds.), Proceedings of the Joint Workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition (pp. 66–75). LiU Electronic Press.
Regents of the University of Michigan. (2009). Michigan Corpus of Upper-Level Student Papers (MICUSP). [URL]
Robin, R., Evans-Romaine, K., & Shatalina, G. (2012). Golosa: A basic course in Russian, Book One. Pearson Higher Education.
Sommer-Farias, B., Carvalho, A., & Picoral, A. (2020). Portuguese language program evaluation: Implementation, results and follow-up strategies. Journal of the National Council of Less Commonly Taught Languages, 281, 1–50.
Sommer-Farias, B., Novikov, A., Picoral, A., Bertho, M. C., & Staples, S. (2021). Soaring Higher with MACAWS (Webinar for Center for Educational Resources in Culture Language and Literacy) [Video]. Youtube. [URL]
Sommer-Farias, B., & Picoral, A. (2020, March). Lexical bundles across genres in an L3 learner corpus [Conference presentation, canceled]. American Association of Applied Linguistics Conference, Denver, United States.
Staples, S., & Dilger, B. (2018–). Corpus and Repository of Writing (Crow). [URL]
Staples, S., Novikov, A., Picoral, A., & Sommer-Farias, B. (2019–). Multilingual Academic Corpus of Assignments – Writing and Speech. [URL]
Staples, S., & Tardy, C. (2019, November). Genre classification of student writing: Methods and insights [Paper presentation]. Symposium on Second Language Writing, Arizona State University, Phoenix, United States.
Steele, J., & Colantoni, L. (2004). The University of Toronto Romance Phonetics Database. University of Toronto: Faculty of Arts and Science. [URL]
University Analytics & Institutional Research. (2021). Enrollment – Census Highlights: Fall 2021 [Interactive Fact Book]. The University of Arizona. [URL]
Cited by (2)
Cited by two other publications
Paquot, Magali
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
