Article published In: Reproducibility, Replicability, and Robustness in Corpus Linguistics
Edited by Martin Schweinberger and Michael Haugh
[International Journal of Corpus Linguistics 30:2] 2025
► pp. 234–260
Reproducibility and transparency in interpretive corpus pragmatics
Published online: 12 June 2025
https://doi.org/10.1075/ijcl.23033.sch
https://doi.org/10.1075/ijcl.23033.sch
Abstract
In this paper we extend the discussion about reproducibility in corpus linguistics from quantitative to
qualitative corpus-based approaches and argue that concerns about reproducibility can be addressed in interpretive research
paradigms like corpus pragmatics. We first suggest that in interpretive research traditions, transparency is more important than
reproducibility. We then argue that interpretive research can be made more transparent and accessible by using notebooks to share
analytical procedures. We support these claims through a case study in which we analyse responses to information-seeking
utterance-final or questions in spoken Australian English data. We use a qualitative, discourse analytic approach
to systematically examine examples of these utterances from selected corpora. We show how corpus linguistic research can draw on
existing infrastructures and tools for ensuring transparency, reproducibility, and replicability of interpretive analyses of the
pragmatic functions of linguistic tokens in situated contexts.
Article outline
- 1.Introduction
- 2.The reproducibility crisis: Background and terminology
- 2.1Definitional issues
- 2.2Challenges with reproducibility
- 3.Infrastructure for transparent corpus linguistics research
- 3.1Data and project management
- 3.2Collaborative workflows, version control, and sharing documents with Git and GitHub
- 3.3Coding and programming for analysing language data
- 3.4Documentation via notebooks
- 3.5Interactivity for exploration and reproduction
- 4.Case study: Responding to utterance-final or information-seeking questions
- 4.1Data and methodology
- 4.1.1Corpus data
- 4.1.2Analytical procedure
- 4.1.3Infrastructure and tools used for analysis
- 4.2Results
- 4.3Discussion
- 4.1Data and methodology
- 5.Conclusion
- Acknowledgments
- Notes
References
References (84)
Aijmer, K., & Rühlemann, C. (Eds.) (2015). Corpus
pragmatics: A handbook. Cambridge University Press.
Anthony, L. (2024). AntConc (Version
4.3.1) [Computer software]. Waseda University. [URL]
Archer, D., & Culpeper, J. (2018). Corpus
annotation. In A. H. Jucker, K. P. Schneider, & W. Bublitz (Eds.), Methods
in
pragmatics (pp. 495–525). De Gruyter Mouton.
Archer, D., & Malory, B. (2017). Tracing
facework over time using semi-automated methods. International Journal of Corpus
Linguistics, 22(1), 27–56.
Austin, J. L. (1975). How
to do things with words (2nd ed.) (J. O. Urmson & M. Sbisà, Eds.). Harvard University Press. (Original work published 1962).
Bednarek, M., Schweinberger, M., & Lee, K. K. H. (2024). Corpus-based
discourse analysis: From meta-reflection to accountability. Corpus Linguistics and Linguistic
Theory, 20(3), 1–28.
Beg, M., Taka, J., Kluyver, T., Konovalov, A., Ragan-Kelley, M., Thiéry, N. M., & Fangohr, H. (2021). Using
Jupyter for reproducible scientific workflows. Computing in Science &
Engineering, 23(2), 36–46.
Bem, D. J. (2011). Feeling
the future: Experimental evidence for anomalous retroactive influences on cognition and
affect. Journal of Personality and Social
Psychology, 100(3), 407–425.
Berez-Kroeker, A., Gawne, L., Kung, S. S., Kelly, B., Heston, T., Holton, G., Pulsifer, P., Beaver, D., Chelliah, S., Dubinsky, S., Meier, R., Thieberger, N., Rice, K., & Woodbury, A. (2018). Reproducible
research in linguistics: A position statement on data citation and attribution in our
field. Linguistics, 56(1), 1–18.
Bolibaugh, C., Vanek, N., & Marsden, E. J. (2021). Towards
a credibility revolution in bilingualism research: Open data and materials as stepping stones to more reproducible and
replicable research. Bilingualism: Language and
Cognition, 24(5), 801–806.
Borge, S. (2013). Questions. In M. Sbisà & K. Turner (Eds.), Pragmatics
of speech
actions (pp. 411–444). De Gruyter Mouton.
(2010). The
Monash corpus of spoken Australian English. In L. de Beuzeville & P. Peters (Eds.), Proceedings
of the 2008 Conference of the Australian Linguistic Society. Australian Linguistic Society. [URL]
Carroll, S. R., Garba, I., Figueroa-Rodríguez, O. L., Holbrook, J., Lovett, R., Materechera, S., Parsons, M., Raseroka, K., Rodriguez-Lonebear, D., Rowe, R., Rodrigo, S., Walker, J. D., Anderson, J., & Hudson, M. (2020). The
CARE principles for Indigenous data governance. Data Science
Journal, 19(43), 1–12.
Coveney, P. V., Groen, D., & Hoekstra, A. G. (2021). Reliability
and reproducibility in computational science: Implementing validation, verification and uncertainty quantification in
silico. Philosophical Transactions of the Royal Society A: Mathematical, Physical
and Engineering Sciences, 379(2197).
Davies, M. (2010). The
Corpus of Contemporary American English as the first reliable monitor corpus of
English. Literary and Linguistic
Computing, 25(4), 447–464.
Docker, Inc. (2013). Docker: Open-source
containerization platform [Computer software]. Docker, Inc. [URL]
Drake, V. (2015). Indexing
uncertainty: The case of turn-final or. Research on Language and Social
Interaction, 48(3), 301–318.
(2021). Alternative
questions and their responses in English
interaction. Pragmatics, 31(1), 62–86.
Drew, P. (1997). ‘Open’
class repair initiators in response to sequential sources of troubles in conversation. Journal
of
Pragmatics, 28(1), 69–101.
Enfield, N. J., Stivers, T., Brown, P., Englert, C., Harjunpää, K., Hayashi, M., Heinemann, T., Hoymann, G., Keisanen, T., Rauniomaa, M., Raymond, C. W., Rossano, F., Yoon, K.-E., Zwitserlood, I., & Levinson, S. C. (2019). Polar
answers. Journal of
Linguistics, 55(2), 277–304.
Flanagan, J. (2017). Reproducible
research: Strategies, tools, and workflows. In T. Hiltunen, J. McVeigh, & T. Säily (Eds.), Big
and rich data in English corpus linguistics: Methods and
Explorations. VARIENG. [URL]
Freese, J., Rauf, T., & Voelkel, J. G. (2022). Advances
in transparency and reproducibility in the social sciences. Social Science
Research, 1071, Article 102770.
Goodman, S. N., Fanelli, D., & Ioannidis, J. P. A. (2016). What
does research reproducibility mean? Science Translational
Medicine, 8(341), Article
341ps12.
Granger, B. E., & Pérez, F. (2021). Jupyter:
Thinking and storytelling with code and data. Computing in Science &
Engineering, 23(2), 7–14.
Gries, S. Th. (2021). Statistics for linguistics with R: A
practical introduction. De Gruyter Mouton.
Haugh, M. (2011). Practices
and defaults in interpreting disjunction. In K. M. Jaszczolt & K. Allan (Eds.), Salience
and defaults in utterance
processing (pp. 189–225). De Gruyter Mouton.
(2014). Jocular
mockery as interactional practice in everyday Anglo-Australian conversation. Australian Journal
of
Linguistics, 34(1), 76–99.
Haugh, M., & Chang, W.-L. M. (2013). Collaborative
creation of spoken language corpora. In T. Greer, D. Tatsuki, & C. Roever (Eds.), Pragmatics
and language
learning (Vol. 131, pp. 133–159). National Foreign Language Resource Center, University of Hawaiʻi at Mānoa.
Haugh, M., & Musgrave, S. (2019). Conversational
lapses and laughter: Towards a combinatorial approach to building collections in conversation
analysis. Journal of
Pragmatics, 1431, 279–291.
Heritage, J., & Raymond, C. W. (2021). Preference
and polarity: Epistemic stance in question design. Research on Language and Social
Interaction, 54(1), 39–59.
Ilie, C. (2021). Questions
we (inter)act with: Interrelatedness of questions and answers in
discourse. In C. Ilie (Ed.), Questioning
and Answering Practices across Contexts and
Cultures (pp. 1–32). John Benjamins.
In’nami, Y., Mizumoto, A., Plonsky, L., & Koizumi, R. (2022). Promoting
computationally reproducible research in applied linguistics: Recommended practices and
considerations. Research Methods in Applied
Linguistics, 1(3), Article
100030.
Jaszczolt, K., Savva, E., & Haugh, M. (2016). The
individual and the social path of interpretation: The case of incomplete disjunctive
questions. In A. Capone & J. Mey (Eds.), Interdisciplinary
Studies in Pragmatics, Culture and
Society (pp. 251–283). Springer.
Jucker, A. H. (2013). Corpus
pragmatics. In J.-O. Östman & J. Verschueren (Eds.), Handbook
of
pragmatics (pp. 1–18). John Benjamins.
(2018). Apologies
in the history of English: Evidence from the Corpus of Historical American English
(COHA). Corpus
Pragmatics, 21, 375–398.
Kirk, J. M. (2016). The
pragmatic annotation scheme of the SPICE-Ireland corpus. International Journal of Corpus
Linguistics, 21(3): 299–322.
Landert, D., Dayter, D., Messerli, T. C., & Locher, M. A. (2023). Corpus
pragmatics. Cambridge University Press.
Lee Kraus, W. (2014). Do
you see what I see?: Quality, reliability, and reproducibility in biomedical
research. Molecular
Endocrinology, 28(3), 277–280.
McEnery, T., & Brezina, V. (2022). Fundamental
principles of corpus linguistics. Cambridge University Press.
Moravscik, A. (2019). Transparency
in qualitative research. In P. Atkinson, S. Delamont, A. Cernat, J. W. Sakshaug, & R. A. Williams (Eds.), SAGE
research methods foundations. Sage.
Musgrave, S., & Haugh, M. (2020). The
Australian National Corpus (and beyond). In L. Willoughby & H. Manns (Eds.), Australian
English reimagined: Structure, features and new
directions (pp. 238–256). Routledge.
Open Science
Collaboration. (2015). Estimating the reproducibility of psychological
science. Science, 349(6251).
Peels, R. (2019). Replicability
and replication in the humanities. Research Integrity and Peer
Review, 4(2).
Pomerantz, A. (1984). Pursuing
a response. In J. M. Atkinson & J. Heritage (Eds.), Structures
of social action: Studies in conversation
analysis (pp. 152–163). Cambridge University Press.
Popper, K. R. (1994). Zwei Bedeutungen von Falsifizierbarkeit [Two meanings of
falsifiability]. In H. Seiffert & G. Radnitzky (Eds.), Handlexikon
der
Wissenschaftstheorie (pp. 82–85). Deutscher Taschenbuch Verlag.
Posit team (2024). RStudio: Integrated
development environment for R [Computer software]. Posit Software. [URL]
Project
Jupyter, Bussonnier, M., Forde, J., Freeman, J., Granger, B., Head, T., Holdgraf, C., Kelley, K., Nalvarte, G., Osheroff, A., Pacer, M., Panda, Y., Perez, F., Ragan-Kelley, B., Willing, C. (2018). Binder
2.0 — Reproducible, interactive, shareable environments for science at
scale. In F. Akici, D. Lippa, D. Niederhut, & M. Pacer (Eds.), Proceedings
of the 17th Python in science
conference (pp. 113–120).
Quirk, R., Greenbaum, S., Leech, G., & Svartvik, J. (1985). A
comprehensive grammar of the English language. Longman.
R Core Team (2023). R: A language and
environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. [URL]
Raymond, G. (2003). Grammar
and social organization: Yes/no interrogatives and the structure of responding. American
Sociological
Review, 68(6), 939–967.
Robinson, J. D. (2020). Revisiting
preference organization in context: A qualitative and quantitative examination of responses to information
seeking. Research on Language & Social
Interaction, 53(2), 197–222.
Robinson, J. D., & Kevoe-Feldman, H. (2010). Using
full repeats to initiate repair on others’ questions. Research on Language and Social
Interaction, 43(3), 232–252.
Romero-Trillo, J. (Ed.) (2008). Pragmatics
and corpus linguistics: A mutualistic entente. De Gruyter Mouton.
Rühlemann, C. (2020). Turn
structure and inserts. International Journal of Corpus
Linguistics, 25(2), 186–215.
Rühlemann, C., & Clancy, B. (2018). Corpus
linguistics and pragmatics. In C. Ilie & N. R. Norrick (Eds.), Pragmatics
and its
interfaces (pp. 241–266). John Benjamins.
Rühlemann, C., & Gries, S. Th. (2021). How do speakers and
hearers disambiguate multi-functional words? The case of well. Functions of
Language, 28(1), 55–80.
Sacks, H. (1987). On
the preferences for agreement and contiguity in sequences in
conversation. In G. Button & J. R. E. Lee (Eds.), Talk
and social
organisation (pp. 54–69). Multilingual Matters.
Sadock, J. M., & Zwicky, A. M. (1985). Speech
act distinctions in syntax. In T. Shopen (Ed.), Language
typology and syntactic
description (Vol. 11, pp. 155–196). Cambridge University Press.
Schegloff, E. A. (1984). On
some questions and ambiguities in conversation. In J. M. Atkinson & J. Heritage (Eds.), Structures
of social
action (pp. 28–52). Cambridge University Press.
(1997). Practices
and actions: Boundary cases of other-initiated repair. Discourse
Processes, 23(3), 499–545.
(2007). Sequence
organization in interaction: A primer in conversation
analysis (Vol. 11). Cambridge University Press.
Schweinberger, M. (forthcoming). Implications
of the replication crisis for corpus linguistics — some suggestions to improve
reproducibility. In M. Laitinen & P. Rautionaho (Eds.), Broadening
horizons: Data-intensive approaches to English. Cambridge University Press.
Stivers, T. (2005). Modified
repeats: One method for asserting primary rights from second position. Research on Language and
Social
Interaction, 38(2), 131–158.
(2010). An
overview of the question-response system in American English conversation. Journal of
Pragmatics, 42(10), 2772–2781.
Stivers, T., & Hayashi, M. (2010). Transformative
answers: One way to resist a question’s constraints. Language in
Society, 39(1), 1–25.
Sönning, L., & Werner, V. (2021). The
replication crisis, scientific revolutions, and
linguistics. Linguistics, 59(5), 1179–1206.
van Rossum, G. (1995). Python
reference manual. CWI. [URL]
Verdonik, D. (2023). Annotating
dialogue acts in speech data. Problematic issues and basic dialogue act
categories. International Journal of Corpus
Linguistics, 28(2), 144–171.
Weisser, M. (2020). Speech
acts in corpus pragmatics: Making the case for an extended taxonomy. International Journal of
Corpus
Linguistics, 25(4), 400–425.
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., Bonino da Silva Santos, L., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R. … Mons, B. (2016). The
FAIR guiding principles for scientific data management and stewardship. Scientific
Data, 31, Article e160018.
Cited by (1)
Cited by one other publication
This list is based on CrossRef data as of 29 october 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
