Article published In: International Journal of Corpus Linguistics
Vol. 29:4 (2024) ► pp.595–610
A user-friendly corpus tool for disciplinary data-driven learning
Introducing CorpusMate
Published online: 16 April 2024
https://doi.org/10.1075/ijcl.23056.cro
https://doi.org/10.1075/ijcl.23056.cro
Abstract
Most corpus tools commonly used for corpus-based data-driven learning (DDL) are designed for research rather than
teaching purposes, with much DDL research suggesting learners and their teachers often stop DDL after initial training due to
tool-related issues like complex user interfaces and system settings. Based on feedback from secondary-age language learners and
their teachers in the Australian context, we present CorpusMate (https://corpusmate.com), a new, user-friendly corpus tool that incorporates several publicly available written and
spoken corpora across 20 disciplinary subjects. It offers a range of flexible concordancing, n-gram and data visualisation options
to ensure a fast, smooth and simple DDL experience for end users.
Article outline
- 1.Introduction
- 2.Platform design
- 3.Corpus data
- 3.1Sources
- British Academic Written English corpus
- TED talk corpus
- Simple English Wikipedia
- BBC Teach
- Elsevier OA CC-BY corpus
- BNC 2014 Spoken
- 3.2Data preparation
- 3.3Disciplinary subject area and mode filters
- 3.1Sources
- 4.CorpusMate functions
- 4.1Frontend UI
- 4.2Concordancing options
- 4.3Visualisation options
- 5.Conclusion
- Notes
References
References (23)
Alsop, S., & Nesi, H. (2009). Issues
in the development of the British Academic Written English (BAWE)
corpus. Corpora, 4(1), 71–83.
Anthony, L. (2018). Visualisation
in corpus-based discourse studies. In C. Taylor & A. Marchi (Eds.), Corpus
approaches to discourse: A critical
review (pp. 197–224). Routledge.
(2022). What
can corpus software do? In A. O’Keeffe & M. J. McCarthy (Eds.), The
Routledge handbook of corpus linguistics (2nd
ed.) (pp. 103–125). Routledge.
Baisa, V., & Suchomel, V. (2014). SkELL:
Web interface for English language learning. In A. Horák & P. Rychlý (Eds.), Eighth
Workshop on Recent Advances in Slavonic Natural Language
Processing (pp. 63–70).
Boulton, A., & Cobb, T. (2017). Corpus
use in language learning: A meta-analysis. Language
Learning, 67(2), 348–393.
Boulton, A., & Vyatkina, N. (2021). Thirty
years of data-driven learning: Taking stock and charting new directions over time. Language
Learning &
Technology, 25(3), 66–89.
Chung, T. T., Crosthwaite, P., Cao, C. T. H., & de Carvalho, C. T. (2024). Walking
the walk? (Mis)alignment of EFL teachers’ self-reported corpus literacy skills and their competence in planning and
implementing corpus-based language pedagogy. TESOL Quarterly.
Crosthwaite, P. (2020). Taking
DDL online: Designing, implementing and evaluating a SPOC on data-driven learning for tertiary L2
writing. Australian Review of Applied
Linguistics, 43(2), 169–195.
Crosthwaite, P., & Boulton, A. (2023). DDL
is dead? Long live DDL! Expanding the boundaries of data-driven
learning. In H. Tyne (Ed.), Discovering
language: Learning and affordance. In press.
Crosthwaite, P., & Steeples, B. (2022). Data-driven
learning with younger learners: Exploring corpus-assisted development of the passive voice for science writing with female
secondary school students. Computer Assisted Language
Learning, 35(4), 1–32.
Dong, J., Zhao, Y., & Buckingham, L. (2022). Charting
the landscape of data-driven learning using a bibliometric
analysis. ReCALL, 35(3), 339–355.
Kershaw, D., & Koeling, R. (2020). Elsevier
OA CC-BY Corpus (Elsevier Data Repository,
V3) [Dataset].
Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., Rychlý, P., & Suchomel, V. (2014). The
Sketch Engine: Ten years
on. Lexicography, 1(1), 7–36.
Lai, S.-L., & Chang, J. S. (2020). Toward
a pattern-based referencing tool: Learner interactions and
perceptions. ReCALL, 3(3), 272–290.
Lee, H., Warschauer, M., & Lee, J. H. (2019). The
effects of corpus use on second language vocabulary learning: A multilevel
meta-analysis. Applied
Linguistics, 40(5), 721–753.
Love, R., Dembry, C., Hardie, A., Brezina, V., & McEnery, T. (2017). The
Spoken BNC2014: Designing and building a spoken corpus of everyday conversations. International
Journal of Corpus
Linguistics, 22(3), 319–344.
Ma, Q., Yuan, R., Cheung, L. M. E., & Yang, J. (2022). Teacher
paths for developing corpus-based language pedagogy: A case study. Computer Assisted Language
Learning, 1–32.
Michelfeit, J., Pomikálek, J., & Suchomel, V. (2014). Text
tokenisation using unitok. In A. Horák & P. Rychlý (Eds.), Eighth
Workshop on Recent Advances in Slavonic Natural Language
Processing, (pp. 71–76).
O’Keeffe, A. (2020). Data-driven
learning – a call for a broader research gaze. Language
Teaching, 54(2), 259–272.
Poole, R. (2022). “Corpus
can be tricky”: revisiting teacher attitudes towards corpus-aided language learning and
teaching. Computer Assisted Language
Learning, 35(7), 1620–1641.
Schmid, H. (1999). Improvements
in part-of-speech tagging with an application to German. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (Eds.), Natural
language processing using very large corpora. Springer.
Şimşek, T., & Can, C. (2023). Integración de la consulta de alfabetización de corpus en la formación de profesores de idiomas: Diseño,
implementación y evaluación del curso [Integration of corpus literacy
consultation into language teacher education: Course design, implementation, and
evaluation]. Porta
Linguarum, 391, 193–211.
Cited by (23)
Cited by 23 other publications
Leńko-Szymańska, Agnieszka
Anthony, Laurence
Anthony, Laurence
Aslam, Aisha
2025. Review of Karpenko-Seccombe (2025): Academic writing with corpora: A resource book for data-driven learning. Australian Review of Applied Linguistics
Blake, John & Maxim Mozgovoy
Boulton, Alex & Luciana Forti
Chen, Chun
Cheung, Lisa & Peter Crosthwaite
Chitez, Madalina, Karla Csürös & Roxana Rogobete
Crosthwaite, Peter & Laurence Anthony
Dong, Jihua & Louisa Buckingham
Dong, Jihua & Hao Wang
Friginal, Eric & Malila Prado
Gablasova, Dana
Gerigk, Kevin Frank
Goulart, Larissa
Jablonkai, Reka R.
Lai, Shu-Li, Hsiao-Ling Hsu & Hsiao-Hui Yang
Loock, Rudy
Moreno-Ortiz, Antonio
Pawlak, Mirosław, Mariusz Kruk, Majid Elahi Shirvan, Tahereh Taherian & Sedigheh Karimpour
Pérez-Paredes, Pascual & Alex Boulton
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
