Towards simpler and more transparent quantitative research reports

Vanhove, Jan

doi:10.1075/itl.20010.van

Article published In: ITL - International Journal of Applied Linguistics
Vol. 172:1 (2021) ► pp.3–25

Get fulltext from our e-platform

Download PDF

Critical position paper

Towards simpler and more transparent quantitative research reports

Jan Vanhove | University of Fribourg

Published online: 13 August 2020

https://doi.org/10.1075/itl.20010.van

Abstract

The average quantitative research report in applied linguistics is needlessly complicated. Articles with over fifty hypothesis tests are no exception, but despite such an onslaught of numbers, the patterns in the data often remain opaque to readers well-versed in quantitative methods, not to mention to colleagues, students, and non-academics without years of experience in navigating results sections. I offer five suggestions for increasing both the transparency and the simplicity of quantitative research reports: (1) round numbers, (2) draw more graphs, (3) run and report fewer significance tests, (4) report simple rather than complex analyses when they yield essentially the same results, and (5) use online appendices liberally to document secondary analyses and share code and data.

Keywords: rounding, nonparametric tests, open science, data visualisation, superfluous significance tests

Article outline

Round more
Show the main results graphically
- Help readers get the gist of the results
- Show that the numerical results are relevant
- Forestall common misunderstandings
Run and report much fewer significance tests
- Silly tests
- Tests in the output that are not relevant to the research question
- Omnibus tests followed by planned comparisons when testing a priori hypotheses
- Pseudo-exploratory significance tests
Sometimes, simple analyses suffice
- Mixed repeated-measures ANOVA versus t-tests
- Multilevel models vs. cluster-level analyses
- Nonparametrics vs. parametric tests
Use appendices liberally
Epilogue
Acknowledgements
Note
References

References (68)

References

Abelson, R. P. (1995). Statistics as principled argument. New York, NY: Psychology Press.

Allen, M., Poggiali, D., Whitaker, K., Marshall, T. R., & Kievit, R. A. (2019). Raincloud plots: A multi-platform tool for robust data visualization. Wellcome Open Research, 4, 63.

Anscombe, F. J. (1973). Graphs in statistical analysis. The American Statistician, 27(1), 17–21.

Anwyl-Irvine, A., Dalmaijer, E. S., Hodges, N., & Evershed, J. K. (2020). Online participants in the wild: Realistic precision & accuracy of platforms, web-browsers, and devices. PsyArxiv Preprints.

Baayen, R. H. (2010). A real experiment is a factorial experiment? The Mental Lexicon, 5(1), 149–157.

Baguley, T. (2009). Standardized or simple effect size: What should be reported? British Journal of Psychology, 100(3), 603–617.

Bender, R., & Lange, S. (2001). Adjusting for multiple testing: When and how? Journal of Clinical Epidemiology, 54(4), 343–349.

Bridges, D., Pitiot, A., MacAskill, M., & Peirce, J. (2020). The timing mega-study: Comparing a range of experiment generators, both lab-based and online. PsyArxiv Preprints.

Chambers, C. (2017). The seven deadly sins of psychology: A manifesto for reforming the culture of scientific practice. Princeton, NJ: Princeton University Press.

Chatfield, C. (1983). Statistics for technology: A course in applied statistics (3rd ed.). Boca Raton, FL: Chapman & Hall/CRC.

Clark, M. (2019). Generalized additive models. Retrieved from [URL]

Cohen, J. (1983). The cost of dichotomization. Applied Psychological Measurement, 71, 249–253.

(1994). The Earth is round (p<.05). American Psychologist, 491, 997–1003.

Cramer, A. O. J., van Ravenzwaaij, D., Matzke, D., Steingroever, H., Wetzels, R., Grasman, R. P. P .P., … Wagenmakers, E. -J. (2016). Hidden multiplicity in exploratory multiway ANOVA: Prevalence and remedies. Psychonomic Bulletin & Review, 23(2), 640–647.

de Groot, A. D. (2014). The meaning of “significance” for different types of research. Acta Psychologica, 1481, 188–194.

Delacre, M., Lakens, D., & Leys, C. (2017). Why psychologists should by default use Welch’s t-test instead of Student’s t-test. International Review of Social Psychology, 30(1), 92–101.

Delacre, M., Leys, C., Mora, Y. L., & Lakens, D. (2019). Taking parametric assumptions seriously: Arguments for the use of Welch’s F-test instead of the classical F-test in one-way ANOVA. International Review of Social Psychology, 32(1), 13.

Ehrenberg, A. S. C. (1977). Rudiments of numeracy. Journal of the Royal Statistical Society. Series A (General), 140(3), 277–297.

(1981). The problem of numeracy. The American Statistician, 35(2), 67–71.

Elwert, F. (2013). Graphical causal models. In S. L. Morgan (Ed.), Handbook of causal analysis for social research (pp. 245–273). Dordrecht, The Netherlands: Springer.

Emerson, J. W., Green, W. A., Schloerke, B., Crowley, J., Cook, D., Hofmann, H., & Wickham, H. (2013). The generalized pairs plot. Journal of Computational and Graphical Statistics, 22(1), 79–91.

Feinberg, R. A., & Wainer, H. (2011). Extracting sunbeams from cucumbers. Journal of Computational and Graphical Statistics, 20(4), 793–810.

Fox, J. (2003). Effect displays in R for generalised linear models. Journal of Statistical Software, 81, 1–27.

Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. New York, NY: Cambridge University Press.

Gelman, A., & Loken, E. (2013). The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time. Retrieved from [URL]

Gigerenzer, G., & Marewski, J. M. (2015). Surrogate science: The idol of a universal method for scientific inference. Journal of Management, 41(2), 421–440.

Goodman, S. (2008). A dirty dozen: Twelve p-value misconceptions. Seminars in Hematology, 451, 135–140.

Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations. European Journal of Epidemiology, 311, 337–350.

Healy, K. (2019). Data visualization: A practical introduction. Princeton, NJ: Princeton University Press.

Hendrix, L. J., Carter, M. W., & Hintze, J. L. (1978). A comparison of five statistical methods for analyzing pretest-posttest designs. Journal of Experimental Education, 47(2), 96–102.

Hesterberg, T. C. (2015). What teachers should know about the bootstrap: Resampling in the undergraduate statistics curriculum. The American Statistician, 69(4), 371–386.

Huck, S. W., & McLean, R. A. (1975). Using a repeated measures ANOVA to analyze the data from a pretest-posttest design: A potentially confusing task. Psychological Bulletin, 82(4), 511–518.

Huitema, B. E. (2011). The analysis of covariance and alternatives: Statistical methods for experiments, quasi-experiments, and single-case studies. Hoboken, NJ: Wiley.

Hünermund, P., & Louw, B. (2020). On the nuisance of control variables in regression analysis. [URL]

Jacoby, W. G. (2006). The dot plot: A graphical display for labeled quantitative values. The Political Methodologist, 14(1), 6–14.

Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review, 2(3), 196–217.

Klein, O., Hardwicke, T. E., Aust, F., Breuer, J., Danielsson, H., Hofelich Mohr, A., … Frank, M. C. (2018). A practical guide for transparency in psychological science. Collabra: Psychology, 4(1), 20.

Krashen, S. (2012). A short paper proposing that we need to write shorter papers. Language and Language Teaching, 1(2), 38–39.

Larson-Hall, J., & Plonsky, L. (2015). Reporting and interpreting quantitative research findings: What gets reported and recommendations for the field. Language Learning, 65(s1), 127–159.

Loewen, S., Gönülal, T., Isbell, D. R., Ballard, L., Crowther, D., Lim, J., … Tigchelaar, M. (2019). How knowledgeable are applied linguistics and SLA researchers about basic statistics?: Data from North America and Europe. Studies in Second Language Acquisition.

MacCallum, R. C., Zhang, S., Preacher, K. J., & Rucker, D. D. (2002). On the practice of dichotomization of quantitative variables. Psychological Methods, 7(1), 19–40.

Maris, E. (1998). Covariance adjustment versus gain scores – revisited. Psychological Methods, 3(3), 309–327.

Maxwell, S. E., & Delaney, H. D. (1993). Bivariate median splits and spurious statistical significance. Psychological Bulletin, 113(1), 181–190.

Maxwell, S. E., Delaney, H., & Hill, C. A. (1984). Another look at ANCOVA versus blocking. Psychological Bulletin, 95(1), 136–147.

McAweeney, M. J., & Klockars, A. J. (1998). Maximizing power in skewed distributions: Analysis and assignment. Psychological Methods, 3(1), 117–122.

Murtaugh, P. A. (2007). Simplicity and complexity in ecological data analysis. Ecology, 88(1), 56–62.

Mutz, D. C., Pemantle, R., & Pham, P. (2019). The perils of balance testing in experimental design: Messy analyses of clean data. The American Statistician, 73(1), 32–42.

Robbins, N. B. (2005). Creating more effective graphs. Hoboken, NJ: Wiley.

Rohrer, J. M. (2018). Thinking clearly about correlations and causation: Graphical causal models for observational data. Advances in Methods and Practices in Psychological Science, 1(1), 27–42.

Rubin, M. (2017). Do p values lose their meaning in exploratory analyses? It depends how you define the familywise error rate. Review of General Psychology, 21(3), 269–275.

Ruxton, G. D., & Beauchamp, G. (2008). Time for some a priori thinking about post hoc testing. Behavioral Ecology, 19(3), 690–693.

Sassenhagen, J., & Alday, P. M. (2016). A common misapplication of statistical inference: Nuisance control with null-hypothesis significance tests. Brain and Language, 1621, 42–45.

Schad, D. J., Vasishth, S., Hohenstein, S., & Kliegl, R. (2020). How to capitalize on a priori contrasts in linear (mixed) models: A tutorial. Journal of Memory and Language, 1101.

Schmider, E., Ziegler, M., Danay, E., Beyer, L., & Bühner, M. (2010). Is it really robust? Reinvestigating the robustness of anova against violations of the normal distribution assumption. Methodology, 61, 147–151.

Senn, S. (2012). Seven myths of randomisation in clinical trials. Statistics in Medicine, 321, 1439–1450.

Sönning, L. (2016). The dot plot: A graphical tool for data analysis and presentation. In H. Christ, D. Klenovšak, L. Sönning, & V. Werner (Eds.), A blend of MaLT: Selected contributions from the Methods and Linguistic Theories Symposium 2015 (pp. 101–129). Bamberg, Germany: University of Bamberg Press.

Steegen, S., Tuerlinckx, F., Gelman, A., & Vanpaemel, W. (2016). Increasing transparency through a multiverse analysis. Perspectives on Psychological Science, 11(5), 702–712.

Tukey, J. W. (1969). Analyzing data: Sanctification or detective work? American Psychologist, 241, 83–91.

Vanhove, J. (2015). Analyzing randomized controlled interventions: Three notes for applied linguists. Studies in Second Language Learning and Teaching, 51, 135–152.

(2019a). Visualising statistical uncertainty using model-based graphs. Presentation at the 8th Biennial International Conference on the Linguistics of Contemporary English, Bamberg, Germany. Retrieved from [URL]

(2019b). cannonball: Tools for teaching statistics. R package, version 0.1.0. Available from [URL]

(2020). Collinearity isn’t a disease that needs curing. PsyArXiv Preprints.

Wainer, H. (1992). Understanding graphs and tables. Educational Researchers, 21(1), 14–23.

Weissgerber, T. L., Milic, N. M., Winham, S. J., & Garovic, V. D. (2015). Beyond bar and line graphs: Time for a new data presentation paradigm. PLOS Biology, 13(4), e1002128.

Wilke, C. O. (2019). Fundamentals of data visualization: A primer on making informative and compelling figures. Sebastopol, CA: O’Reilly.

Zimmerman, D. W. (1998). Invalidation of parametric and nonparametric statistical tests by concurrent violation of two assumptions. Journal of Experimental Education, 67(1), 55–68.

Zuur, A. F., Ieno, E. N., Walker, N. J., Saveliev, A. A., & Smith, G. M. (2009). Mixed effects models and extensions in ecology with R. New York, NY: Springer.

Ågren, M., & van de Weijer, J. (2019). The production of preverbal liaison in Swedish learners of L2 French. Language, Interaction and Acquisition, 10(1), 117–139.

Cited by (4)

Cited by four other publications

Order by:

Loerts, Hanneke & Greg Poarch

2026. JASP for (web-based) statistics. In Digital and Internet-Based Research Methods in Applied Linguistics [Research Methods in Applied Linguistics, 15], ► pp. 386 ff.

Peters, Elke, Eva Puimège & Paweł Szudarski

2023. Repetition and Incidental Learning of Multiword Units: A Conceptual Multisite Replication Study of Webb, Newton, and Chang (2013). Language Learning 73:4 ► pp. 1211 ff.

Agyeiwaah, Elizabeth, Frank Badu Baiden, Emmanuel Gamor & Fu-Chieh Hsu

2022. Determining the attributes that influence students’ online learning satisfaction during COVID-19 pandemic. Journal of Hospitality, Leisure, Sport & Tourism Education 30 ► pp. 100364 ff.

ISBELL, DANIEL R., DAN BROWN, MEISHAN CHEN, DEIRDRE J. DERRICK, ROMY GHANEM, MARÍA NELLY GUTIÉRREZ ARVIZU, ERIN SCHNUR, MEIXIU ZHANG & LUKE PLONSKY

2022. Misconduct and Questionable Research Practices: The Ethics of Quantitative Data Handling and Reporting in Applied Linguistics. The Modern Language Journal 106:1 ► pp. 172 ff.

This list is based on CrossRef data as of 30 march 2026. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.