Cover not available

Article published In: Reproducibility, Replicability, and Robustness in Corpus Linguistics
Edited by Martin Schweinberger and Michael Haugh
[International Journal of Corpus Linguistics 30:2] 2025
► pp. 130149

References (100)
References
Alvarez, R. M., & Heuberger, S. (2022). How (not) to reproduce: Practical considerations to improve research transparency in political science. PS: Political Science & Politics, 55(1), 149–154. Google Scholar logo with link to Google Scholar
Anderson, S. F., & Maxwell, S. E. (2016). There’s more than one way to conduct a replication study: Beyond statistical significance. Psychological Methods, 21(1), 1–12. Google Scholar logo with link to Google Scholar
Andringa, S., & Godfroid, A. (2020). Sampling bias and the problem of generalizability in applied linguistics. Annual Review of Applied Linguistics, 401, 134–142. Google Scholar logo with link to Google Scholar
Artner, R., Verliefde, T., Steegen, S., Gomes, S., Traets, F., Tuerlinckx, F., & Vanpaemel, W. (2021). The reproducibility of statistical results in psychological research: An investigation using unpublished raw data. Psychological Methods, 26(5), 527–546. Google Scholar logo with link to Google Scholar
Bakker, M., & Wicherts, J. M. (2011). The (mis)reporting of statistical results in psychology journals. Behavior Research Methods, 43(3), 666–678. Google Scholar logo with link to Google Scholar
Barth, D., & Kapatsinski, V. (2017). A multimodel inference approach to categorical variant choice: Construction, priming and frequency effects on the choice between full and contracted forms of am, are and is. Corpus Linguistics and Linguistic Theory, 13(2), 203–260. Google Scholar logo with link to Google Scholar
Belz, A., Agarwal, S., Shimorina, A., & Reiter, E. (2021). A systematic review of reproducibility research in natural language processing. In P. Merlo, J. Tiedemann, & R. Tsarfaty (Eds.), Proceedings of the 16th conference of the European chapter of the Association for Computational Linguistics: Main volume (pp. 381–393). Association for Computational Linguistics. Google Scholar logo with link to Google Scholar
Bernaisch, T., Gries, S. Th., & Heller, B. (2022). Theoretical models and statistical modelling of linguistic epicentres. World Englishes, 41(3), 333–346. Google Scholar logo with link to Google Scholar
Biber, D. (1988). Variation across speech and writing. Cambridge University Press. Google Scholar logo with link to Google Scholar
Bisang, W. (2011). Variation and reproducibility in linguistics. In P. Siemund (Ed.), Linguistic universals and language variation (pp. 237–263). De Gruyter Mouton. Google Scholar logo with link to Google Scholar
BNC Consortium. (2007). British National Corpus (version 3, BNC XML ed.). [URL]
Bollen, K., Cacioppo, J. T., Kaplan, R. M., Krosnick, J. A., & Olds, J. L. (2015). Social, behavioral, and economic sciences perspectives on robust and reliable science (Report of the Subcommittee on Replicability in Science Advisory Committee to the National Science Foundation Directorate for Social, Behavioral, and Economic Sciences). National Science Foundation. [URL]Google Scholar logo with link to Google Scholar
Brezina, V., McEnery, T., & Wattam, S. (2015). Collocations in context: A new perspective on collocation networks. International Journal of Corpus Linguistics, 20(2), 139–173. Google Scholar logo with link to Google Scholar
Brezina, V., & Meyerhoff, M. (2014). Significant or random?: A critical review of sociolinguistic generalisations based on large corpora. International Journal of Corpus Linguistics, 19(1), 1–28. Google Scholar logo with link to Google Scholar
Brezina, V., & Timperley, M. (2017). How large is the BNC? A proposal for standardised tokenization and word counting. [Conference presentation]. Corpus linguistics conference 2017, Birmingham, UK.
Burch, B., & Egbert, J. (2022a). Confidence intervals for ratios of means applied to corpus-based word frequency classes. Journal of Applied Statistics, 50(7), 1592–1610. Google Scholar logo with link to Google Scholar
(2022b). Word use equivalence and hierarchical word tiers. Journal of Quantitative Linguistics, 30(1), 104–124. Google Scholar logo with link to Google Scholar
Burch, B., Egbert, J., & Biber, D. (2017). Measuring and interpreting lexical dispersion in corpus linguistics. Journal of Research Design and Statistics in Linguistics and Communication Science, 3(2), 189–216. Google Scholar logo with link to Google Scholar
Claerbout, J. F., & Karrenbach, M. (1992). Electronic documents give reproducible research a new meaning. In SEG Technical Program expanded abstracts 1992, (pp. 601–604). Google Scholar logo with link to Google Scholar
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49(12), 997–1003. Google Scholar logo with link to Google Scholar
Doyle, P. G. (2003). Replicating corpus linguistics: A corpus-driven investigation of lexical networks in texts [Unpublished doctoral dissertation]. Lancaster University.
Earp, B. D., & Trafimow, D. (2015). Replication, falsification, and the crisis of confidence in social psychology. Frontiers in Psychology, 61, Article 621. Google Scholar logo with link to Google Scholar
Egbert, J., & Baker, P. (Eds.). (2021). Using corpus methods to triangulate linguistic analysis. Routledge. Google Scholar logo with link to Google Scholar
Egbert, J., & Biber, D. (2019). Incorporating text dispersion into keyword analyses. Corpora, 14(1), 77–104. Google Scholar logo with link to Google Scholar
Egbert, J., Biber, D., & Gray, B. (2022). Designing and evaluating language corpora: A practical framework for corpus representativeness. Cambridge University Press. Google Scholar logo with link to Google Scholar
Egbert, J., Burch, B., & Biber, D. (2020). Lexical dispersion and corpus design. International Journal of Corpus Linguistics, 25(1), 89–115. Google Scholar logo with link to Google Scholar
Egbert, J., Larsson, T., & Biber, D. (2020). Doing linguistics with a corpus: Methodological considerations for the everyday user (1st ed.). Cambridge University Press. Google Scholar logo with link to Google Scholar
Eubank, N. (2016). Lessons from a decade of replications at the quarterly journal of political science. PS: Political Science & Politics, 49(2), 273–276. Google Scholar logo with link to Google Scholar
Flanagan, J. (2017). Reproducible research: Strategies, tools, and workflows. In T. Hiltunen, J. McVeigh, & T. Säily (Eds.), Big and rich data in English corpus linguistics: Methods and explorations. VARIENG. [URL]
Fletcher, S. C. (2021). How (not) to measure replication. European Journal for Philosophy of Science, 11(2), 57. Google Scholar logo with link to Google Scholar
Fuscone, S., Favre, B., & Prévot, L. (2021). Reproducibility in speech rate convergence experiments. Language Resources and Evaluation, 55(3), 817–832. Google Scholar logo with link to Google Scholar
Gawne, L., & Berez-Kroeker, A. L. (2018). Reflections on reproducible research. In B. McDonnell, A. L. Berez-Kroeker, & G. Holton (Eds.), Reflections on language documentation 20 years after Himmelmann 1998 (pp. 22–32). University of Hawaiʻi Press. [URL]
Gelman, A., & Loken, E. (2013, November 13). The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time. [URL]
Gelman, A., & Stern, H. (2006). The difference between “significant” and “not significant” is not itself statistically significant. The American Statistician, 60(4), 328–331. Google Scholar logo with link to Google Scholar
Gervais, W. M. (2021). Practical methodological reform needs good theory. Perspectives on Psychological Science, 16(4), 827–843. Google Scholar logo with link to Google Scholar
Gigerenzer, G. (2004). Mindless statistics. The Journal of Socio-Economics, 33(5), 587–606. Google Scholar logo with link to Google Scholar
Gries, S. Th. (2015). The most under-used statistical method in corpus linguistics: Multi-level (and mixed-effects) models. Corpora, 10(1), 95–125. Google Scholar logo with link to Google Scholar
(2020). Analyzing dispersion. In M. Paquot & S. T. Gries (Eds.), A practical handbook of corpus linguistics (pp. 99–118). Springer. Google Scholar logo with link to Google Scholar
(2021). (Generalized linear) mixed-effects modeling: A learner corpus example. Language Learning, 71(3), 757–798. Google Scholar logo with link to Google Scholar
(2022a). What do (most of) our dispersion measures measure (most)? Dispersion? Journal of Second Language Studies, 5(2), 171–205. Google Scholar logo with link to Google Scholar
(2022b). Toward more careful corpus statistics: Uncertainty estimates for frequencies, dispersions, association measures, and more. Research Methods in Applied Linguistics, 1(1), Article 100002. Google Scholar logo with link to Google Scholar
Gries, S. Th., & Paquot, M. (2020). Writing up a corpus-linguistic paper. In M. Paquot & S. Th. Gries (Eds.), A practical handbook of corpus linguistics (pp. 647–659). Springer. Google Scholar logo with link to Google Scholar
Hackert, S. (2008). Counting and coding the past: Circumscribing the variable context in quantitative analyses of past inflection. Language Variation and Change, 20(1), 127–153. Google Scholar logo with link to Google Scholar
Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., & Stahel, W. A. (1986). Robust statistics: The approach based on influence functions (1st ed.). Wiley. Google Scholar logo with link to Google Scholar
Hardwicke, T. E., Bohn, M., MacDonald, K., Hembacher, E., Nuijten, M. B., Peloquin, B. N., deMayo, B. L., Yoon, E. J., & Frank, M. C. (2021). Analytic reproducibility in articles receiving open data badges at the journal Psychological Science: An observational study. R. Soc. Open Sci., 81, Article 201494. Google Scholar logo with link to Google Scholar
Hardwicke, T. E., Wallach, J. D., Kidwell, M. C., Bendixen, T., Crüwell, S., & Ioannidis, J. P. A. (2020). An empirical assessment of transparency and reproducibility-related research practices in the social sciences (2014–2017). R. Soc. Open Sci., 71, Article 190806. Google Scholar logo with link to Google Scholar
Hundt, M. (2021). On models and modelling. World Englishes, 40(3), 298–317. Google Scholar logo with link to Google Scholar
In’nami, Y., Mizumoto, A., Plonsky, L., & Koizumi, R. (2022). Promoting computationally reproducible research in applied linguistics: Recommended practices and considerations. Research Methods in Applied Linguistics, 1(3), Article 1000030. Google Scholar logo with link to Google Scholar
Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2(8), e124. Google Scholar logo with link to Google Scholar
John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23(5), 524–532. Google Scholar logo with link to Google Scholar
Kytö, M., & Smitterberg, E. (2015). Diachronic registers. In D. Biber & R. Reppen (Eds.), The Cambridge handbook of English corpus linguistics (pp. 330–345). Cambridge University Press. Google Scholar logo with link to Google Scholar
Laurinavichyute, A., Yadav, H., & Vasishth, S. (2022). Share the code, not just the data: A case study of the reproducibility of articles published in the Journal of Memory and Language under the open data policy. Journal of Memory and Language, 1251, Article 104332. Google Scholar logo with link to Google Scholar
Lee, D. Y. W. (2000). Modelling variation in spoken and written language: The multi-dimensional approach revisited [Unpublished doctoral dissertation]. Lancaster University.
Lundberg, I., Johnson, R., & Stewart, B. M. (2021). What is your estimand? Defining the target quantity connects statistical evidence to theory. American Sociological Review, 86(3), 532–565. Google Scholar logo with link to Google Scholar
McElreath, R. (2020). Statistical rethinking: A Bayesian course with examples in R and STAN (2nd ed.). Chapman and Hall/CRC. Google Scholar logo with link to Google Scholar
McEnery, T., & Brezina, V. (2022). Fundamental principles of corpus linguistics (1st ed.). Cambridge University Press. Google Scholar logo with link to Google Scholar
McEnery, T., & Hardie, A. (2011). Corpus linguistics: Method, theory and practice. Cambridge University Press. Google Scholar logo with link to Google Scholar
Meehl, P. E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science, 34(2), 103–115. Google Scholar logo with link to Google Scholar
Mehl, S. (2021). What we talk about when we talk about corpus frequency: The example of polysemous verbs with light and concrete senses. Corpus Linguistics and Linguistic Theory, 17(1), 223–247. Google Scholar logo with link to Google Scholar
National Academies of Sciences, Engineering, and Medicine. (2019). Reproducibility and replicability in science. National Academies Press. Google Scholar logo with link to Google Scholar
Nosek, B. A., Hardwicke, T. E., Moshontz, H., Allard, A., Corker, K. S., Dreber, A., Fidler, F., Hilgard, J., Kline Struhl, M., Nuijten, M. B., Rohrer, J. M., Romero, F., Scheel, A. M., Scherer, L. D., Schönbrodt, F. D., & Vazire, S. (2022). Replicability, robustness, and reproducibility in psychological science. Annual Review of Psychology, 731, 719–748. Google Scholar logo with link to Google Scholar
Nuijten, M. B., Hartgerink, C. H. J., van Assen, M. A. L. M., Epskamp, S., & Wicherts, J. M. (2016). The prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods, 481, 1205–1226. Google Scholar logo with link to Google Scholar
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716. Google Scholar logo with link to Google Scholar
Pedersen, T. (2008). Empiricism is not a matter of faith. Computational Linguistics, 34(3), 465–470. Google Scholar logo with link to Google Scholar
Peikert, A., & Brandmaier, A. M. (2021). A reproducible data analysis workflow with R Markdown, Git, Make, and Docker. Quantitative and Computational Methods in Behavioral Sciences, 11, Article e3763. Google Scholar logo with link to Google Scholar
Peng, R. D., & Hicks, S. C. (2021). Reproducible research: A retrospective. Annual Review of Public Health, 421, 79–93. Google Scholar logo with link to Google Scholar
Phillips, M. (1985). Aspects of text structure: An investigation of the lexical organisation of text. North-Holland.Google Scholar logo with link to Google Scholar
Pietschnig, J., Siegel, M., Eder, J. S. N., & Gittler, G. (2019). Effect declines are systematic, strong, and ubiquitous: A meta-meta-analysis of the decline effect in intelligence research. Frontiers in Psychology, 101, Article 2874. Google Scholar logo with link to Google Scholar
Porte, G., & McManus, K. (2018). Doing replication research in applied linguistics (1st ed.). Routledge. Google Scholar logo with link to Google Scholar
Rastle, K. (2022). Improving reproducibility in the Journal of Memory and Language. Journal of Memory and Language, 1261, Article 104351. Google Scholar logo with link to Google Scholar
Schützler, O., & Schlüter, J. (Eds.). (2022). Data and methods in corpus linguistics: Comparative approaches [Supplemental material]. Cambridge University Press. [URL].
Sönning, L. (2024). Evaluation of keyness metrics: Performance and reliability. Corpus Linguistics and Linguistic Theory, 20(1), 263–288. Google Scholar logo with link to Google Scholar
Sönning, L., & Grafmiller, J. (2024). Seeing the wood for the trees: Predictive margins for random forests. Corpus Linguistics and Linguistic Theory, 20(1), 153–181. Google Scholar logo with link to Google Scholar
Sönning, L., & Krug, M. (2022). Comparing study designs and down-sampling strategies in corpus analysis: The importance of speaker metadata in the BNCs of 1994 and 2014. In O. Schützler & J. Schlüter (Eds.), Data and methods in corpus linguistics: Comparative approaches (pp. 127–160). Cambridge University Press. Google Scholar logo with link to Google Scholar
Sönning, L., & Werner, V. (Eds.). (2021a). The replication crisis: Implications for linguistics [Special issue]. Linguistics, 59(5). [URL]
(2021b). The replication crisis, scientific revolutions, and linguistics. Linguistics, 59(5), 1179–1206. Google Scholar logo with link to Google Scholar
Spence, J. R., & Stanley, D. J. (2016). Prediction interval: What to expect when you’re expecting … a replication. PLOS ONE, 11(9), Article e0162874. Google Scholar logo with link to Google Scholar
Staudte, R. G., & Sheather, S. J. (1990). Robust estimation and testing (1st ed.). Wiley. Google Scholar logo with link to Google Scholar
Steegen, S., Tuerlinckx, F., Gelman, A., & Vanpaemel, W. (2016). Increasing transparency through a multiverse analysis. Perspectives on Psychological Science, 11(5), 702–712. Google Scholar logo with link to Google Scholar
Stefanowitsch, A. (2020). Corpus linguistics: A guide to the methodology. Language Science Press. Google Scholar logo with link to Google Scholar
Sterling, T. D. (1959). Publication decisions and their possible effects on inferences drawn from test of significance — or vice versa. Journal of the American Statistical Association, 54(285), 30–34. Google Scholar logo with link to Google Scholar
Stodden, V., Seiler, J., & Ma, Z. (2018). An empirical analysis of journal policy effectiveness for computational reproducibility. Proceedings of the National Academy of Sciences, 115(11), 2584–2589. Google Scholar logo with link to Google Scholar
Stubbs, M. (2001). Words and phrases: Corpus studies of lexical semantics. Blackwell.Google Scholar logo with link to Google Scholar
Szmrecsanyi, B., Biber, D., Egbert, J., & Franco, K. (2016). Toward more accountability: Modeling ternary genitive variation in Late Modern English. Language Variation and Change, 28(1), 1–29. Google Scholar logo with link to Google Scholar
Trisovic, A., Lau, M. K., Pasquier, T., & Crosas, M. (2022). A large-scale study on research code quality and execution. Scientific Data, 9(1), 60. Google Scholar logo with link to Google Scholar
Vanpaemel, W., Vermorgen, M., Deriemaecker, L., & Storms, G. (2015). Are we wasting a good crisis? The availability of psychological research data after the storm. Collabra, 1(1), 3. Google Scholar logo with link to Google Scholar
Vasishth, S., & Gelman, A. (2021). How to embrace variation and accept uncertainty in linguistic and psycholinguistic data analysis. Linguistics, 59(5), 1311–1342. Google Scholar logo with link to Google Scholar
Vetter, F. (2021). Issues of corpus comparability and register variation in the International Corpus of English: Theories and computer applications [Doctoral dissertation, Otto-Friedrich-Universität].
Wallis, S. (2017, February 16). The replication crisis: What does it mean for corpus linguistics? corp.ling.stats: statistics for corpus linguistics. [URL]
(2019). Comparing χ2 tables for separability of distribution and effect: Meta-tests for comparing homogeneity and goodness of fit contingency test outcomes. Journal of Quantitative Linguistics, 26(4), 330–355. Google Scholar logo with link to Google Scholar
(2020). Statistics in corpus linguistics research: A new approach (1st ed.). Routledge. Google Scholar logo with link to Google Scholar
(2022). Accurate confidence intervals on Binomial proportions, functions of proportions, algebraic formulae and effect sizes. [URL]
Wallis, S., & Mehl, S. (2022). Comparing baselines for corpus analysis: Research into the get-passive in speech and writing. In O. Schützler & J. Schlüter (Eds.), Data and methods in corpus linguistics: Comparative approaches (1st ed., pp. 101–126). Cambridge University Press. Google Scholar logo with link to Google Scholar
Whitaker, K. (2017, September 26). Publishing a reproducible paper [Conference presentation]. Open science in practice summer school, Lausanne, Switzerland.
Wieling, M., Rawee, J., & van Noord, G. (2018). Reproducibility in computational linguistics: Are we willing to share? Computational Linguistics, 44(4), 641–649. Google Scholar logo with link to Google Scholar
Wilcox, R. R. (2013). Introduction to robust estimation and hypothesis testing (3rd ed.). Academic Press. Google Scholar logo with link to Google Scholar
Wilson, G., Bryan, J., Cranston, K., Kitzes, J., Nederbragt, L., & Teal, T. K. (2017). Good enough practices in scientific computing. PLOS Computational Biology, 13(6). Article e1005510. Google Scholar logo with link to Google Scholar
Yarkoni, T. (2022). The generalizability crisis. Behavioral and Brain Sciences, 45(e1). Google Scholar logo with link to Google Scholar
Young, C. (2018). Model uncertainty and the crisis in science. Socius: Sociological Research for a Dynamic World, 41. Google Scholar logo with link to Google Scholar
Young, C., & Holsteen, K. (2017). Model uncertainty and robustness: A computational framework for multimodel analysis. Sociological Methods & Research, 46(1), 3–40. Google Scholar logo with link to Google Scholar
Cited by (2)

Cited by two other publications

Becker, Laura & Matías Guzmán Naranjo
2025. Authors’ response to “Replication and methodological robustness in quantitative typology”. Linguistic Typology 29:3  pp. 591 ff. DOI logo
Frenken, Florian, Stephanie Evert, Gerold Schneider & Stella Neumann
2025. How stable are multivariate findings about register variation across varieties of English? On the replicability of Geometric Multivariate Analysis. ICAME Journal 49:1  pp. 23 ff. DOI logo

This list is based on CrossRef data as of 21 november 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.

Mobile Menu Logo with link to supplementary files background Layer 1 prag Twitter_Logo_Blue