Introduction published In: Reproducibility, Replicability, and Robustness in Corpus Linguistics
Edited by Martin Schweinberger and Michael Haugh
[International Journal of Corpus Linguistics 30:2] 2025
► pp. 119–129
Introduction
Reproducibility, replicability, and robustness in corpus linguistics
Published online: 12 June 2025
https://doi.org/10.1075/ijcl.25081.sch
https://doi.org/10.1075/ijcl.25081.sch
Abstract
This introduction to the special issue Reproducibility, Replicability, and Robustness in Corpus
Linguistics calls for more transparent and robust research practices in the field. It situates the discussion within
the broader replication crisis in the life and social sciences and explores its relevance for corpus linguistics. The article
identifies key areas for improvement — data management, workflows, and reporting — and showcases tools and principles such as
FAIR/CARE, version control, reproducible notebooks, and open repositories. It highlights how corpus linguistics can build on open
science infrastructures to enhance methodological rigor. Practical challenges, including data sensitivity and skill gaps, are
addressed with actionable strategies. The issue brings together contributions that clarify core terminology, test the robustness
of established methods, and suggest concrete ways forward. Together, these articles offer conceptual and practical guidance for
making corpus linguistic research more open, verifiable, and aligned with broader scientific standards.
Keywords: reproducibility, replicability, robustness, accountability, transparency
Article outline
- 1.The replication crisis as an opportunity for (corpus) linguistics
- 2.Pointers for transparent, reproducible, and robust corpus linguistics research
- 3.Structure of this special issue
- 4.Future directions
- Notes
References
References (26)
Anthony, L. (2024). AntConc
(Version 4.3.1) [Computer software]. Waseda University. [URL]
Bednarek, M., Schweinberger, M., & Lee, K. K. H. (2024). Corpus-based
discourse analysis: From meta-reflection to accountability. Corpus Linguistics and Linguistic
Theory, 20(3), 539–566.
Berez-Kroeker, A., Gawne, L., Kung, S. S., Kelly, B., Heston, T., Holton, G., Pulsifer, P., Beaver, D., Chelliah, S., Dubinsky, S., Meier, R., Thieberger, N., Rice, K., & Woodbury, A. (2018). Reproducible
research in linguistics: A position statement on data citation and attribution in our
field. Linguistics, 56(1), 1–18.
Blischak, J. D., Davenport, E. R., & Wilson, G. (2016). A
quick introduction to version control with Git and GitHub. PLOS Computational
Biology, 12(1), Article
e1004668.
Boersma, P., & van Heuven, V. (2001). Speak
and unSpeak with PRAAT. Glot
International, 5(9/10), 341–347. [URL]
Bolibaugh, C., Vanek, N., & Marsden, E. J. (2021). Towards
a credibility revolution in bilingualism research: Open data and materials as stepping stones to more reproducible and
replicable research. Bilingualism: Language and
Cognition, 24(5), 801–806.
Bollen, K., Cacioppo, J. T., Kaplan, R. M., Krosnick, J. A., & Olds, J. L. (2015). Social,
behavioral, and economic sciences perspectives on robust and reliable science (Report of the
Subcommittee on Replicability in Science Advisory Committee to the National Science Foundation Directorate for Social,
Behavioral, and Economic Sciences). National Science Foundation. [URL]
Bühl, A. (2018). SPSS: Einführung in die moderne Datenanalyse ab SPSS 25 (16. Aufl). [SPSS: Introduction to modern data analysis from SPSS 25 onwards (16th
ed).]. Pearson.
Calamai, S., & Frontini, F. (2018). FAIR
data principles and their application to speech and oral archives. Journal of New Music
Research, 47(4), 339–354.
Carroll, S. R., Garba, I., Figueroa-Rodríguez, O. L., Holbrook, J., Lovett, R., Materechera, S., Parsons, M., Raseroka, K., Rodriguez-Lonebear, D., Rowe, R., Rodrigo, S., Walker, J. D., Anderson, J., & Hudson, M. (2020). The
CARE principles for Indigenous data governance. Data Science
Journal, 19(43), 1–12.
Gries, S. T. (2022). Toward
more careful corpus statistics: Uncertainty estimates for frequencies, dispersions, association measures, and
more. Research Methods in Applied
Linguistics, 1(1), Article
100002.
Kluyver, T., Ragan-Kelley, B., Pérez, F., Granger, B., Bussonnier, M., Frederic, J., Kelley, K., Hamrick, J., Grout, J., Corlay, S., Ivanov, P., Avila, D., Abdalla, S., Willing, C., & Jupyter
development team (2016). Jupyter Notebooks — a
publishing format for reproducible computational workflows. In F. Loizides & B. Schmidt (Eds.), Positioning
and power in academic publishing: Players, agents, and agendas. Proceedings of the 20th international conference on electronic
publishing (pp. 87–90). IOS Press.
Marques, J. F., & Bernardino, J. (2020). Analysis
of data anonymization techniques. In D. Aveiro, J. Dietz, & J. Filipe. Proceedings
of the 12th international joint conference on knowledge discovery, knowledge engineering and knowledge management IC3K: Vol.
2.
KEOD (pp. 235–241). SciTePress.
Open Science
Collaboration. (2015). Estimating the reproducibility of psychological
science. Science, 349(6251).
Pedersen, J. (2007). Protocols
of research and design: Reflections on a participatory design project (sort of) (Doctoral
dissertation, Ph. D. thesis. Copenhagen: IT University): Danish Association for Science and Technology Studies. [URL]
Rastle, K. (2022). Improving
reproducibility in the Journal of Memory and Language. Journal of Memory and
Language, 1261, Article 104351.
Roettger, T. B., Winter, B., & Baayen, H. (2019). Emergent
data analysis in phonetic sciences: Towards pluralism and reproducibility. Journal of
Phonetics, 731, 1–7.
Scott, M. (2008). Developing
Wordsmith. International Journal of English
Studies, 8(1), 95–106. [URL]
Sönning, L., & Werner, V. (2021). The
replication crisis, scientific revolutions, and
linguistics. Linguistics, 59(5), 1179–1206.
Wicherts, J. M., Veldkamp, C. L. S., Augusteijn, H. E. M., Bakker, M., van Aert, R. C. M., & van Assen, M. A. L. M. (2016). Degrees
of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid
p-hacking. Frontiers in
Psychology, 7(1832).
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., Bonino da Silva Santos, L., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016). The
FAIR guiding principles for scientific data management and stewardship. Scientific
Data, 31, Article 160018.
Cited by (1)
Cited by one other publication
This list is based on CrossRef data as of 29 october 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
