In:Crossroads Semantics: Computation, experiment and grammar
Edited by Hilke Reckman, Lisa Lai-Shen Cheng, Maarten Hijzelendoorn and Rint Sybesma
[Not in series 210] 2017
► pp. 23–37
Chapter 2Experimental research
Problems and opportunities in the big-data era
Published online: 12 April 2017
https://doi.org/10.1075/z.210.02cre
https://doi.org/10.1075/z.210.02cre
Abstract
Experimental research in psychology, psycholinguistics or medicine provides quantitative and therefore seemingly conclusive and trustworthy evidence. However, it has been convincingly shown that most research findings are actually false. This has hardly influenced the dominant scientific evaluation system which reflects a continued trust in the unbiasedness of data by a strong reliance on simple quantifications of scientific quality and productivity, such as number of publications and number of citations. This state of affairs is remarkable in the light of a long history of strong criticism of commonly used inference methods and scientific evaluation systems, which is now backed by large-scale research projects directly questioning the reproducibility of scientific findings. This way, the large amounts of data – “big-data” – have helped to uncover some of these problematic issues, but also provided a more open attitude towards data and code sharing. In addition, novel analytic frameworks may help to better integrate empirical data with computational models.
Article outline
- 1.Introduction
- 2.The questionable empirical toolbox
- 2.1Bias
- 2.2Null hypothesis testing
- 2.3Theory testing
- 3.Scientific publications and evaluation
- 3.1The economy of the publication and evaluation systems
- 3.2Alternatives for the evaluation system
- 4.More data, more problems?
Notes References
References (55)
Bargh, John A., Mark Chen & Lara Burrows. 1996. Automaticity of social behavior: Direct effects of trait construct and stereotype activation on action. Journal of Personality and Social Psychology 71(2). 230–244. .
Bornmann, Lutz, Rüdiger Mutz & Hans-Dieter Daniel. 2010. A reliability-generalization study of journal peer reviews: a multilevel meta-analysis of inter-rater reliability and its determinants. PLoS ONE 5(12). e14331.
Breiman, Leo. 2001. Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). Statistical Science 16(3). 199–231.
Button, Katherine S., John P. A. Ioannidis, Claire Mokrysz, Brian A. Nosek, Jonathan Flint, Emma S. J. Robinson & Marcus R. Munafò. 2013. Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience 14(5). 365–376.
Carp, Joshua. 2012. The secret lives of experiments: Methods reporting in the fMRI literature. NeuroImage 63(1). 289–300.
Collobert, Ronan & Jason Weston. 2008. A unified architecture for natural language processing: deep neural networks with multitask learning. the 25th international conference. New York, NY: ACM.
Coyne, James C & Jacob N de Voogd. 2012. Are we witnessing the decline effect in the Type D personality literature? What can be learned? Journal of psychosomatic research 73(6). 401–407.
Doyen, Stéphane, Olivier Klein, Cora-Lise Pichon & Axel Cleeremans. 2012. Behavioral priming: It’s all in the mind, but whose mind? PLoS ONE 7(1). e29081.
Dreber, Anna, Thomas Pfeiffer, Johan Almenberg, Siri Isaksson, Brad Wilson, Yiling Chen, Brian A. Nosek & Magnus Johannesson. 2015. Using prediction markets to estimate the reproducibility of scientific research. Proceedings of the National Academy of Sciences 112(50). 15343–15347.
Falk, Ruma & Charles W. Greenbaum. 1995. Significance tests die hard: The amazing persistence of a probabilistic misconception. Theory & Psychology 5(1). 75–98.
Fanelli, Daniele. 2010. “Positive” results increase down the hierarchy of the sciences. PLoS ONE 5(4). e10068.
Fugelsang, Jonathan A., Courtney B. Stein, Adam E. Green & Kevin N. Dunbar. 2004. Theory and data interactions of the scientific mind: Evidence from the molecular and the cognitive laboratory. Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expérimentale 58(2). 86–95.
Gilbert, Daniel T., Gary King, Stephen Pettigrew & Timothy D. Wilson. 2016. Comment on “Estimating the reproducibility of psychological science”. Science 351(6277). 1037–1037.
Greenberg, Steven A. 2009. How citation distortions create unfounded authority: analysis of a citation network. BMJ (Clinical research ed.) 339. b2680.
Greenwald, Anthony G. 2012. There is nothing so theoretical as a good method. Perspectives on Psychological Science.
Hill, Ray. 2004. Multiple sudden infant deaths – coincidence or beyond coincidence? Paediatric and perinatal epidemiology 18(5). 320–326. Blackwell Science Ltd.
Ioannidis, John P. A. 2005. Why most published research findings are false. PLoS Medicine 2(8). e124.
Ioannidis, John P A. 2008. Why most discovered true associations are inflated. Epidemiology 19(5). 640–648.
Ioannidis, John P.A. 2012. Why science is not necessarily self-correcting. Perspectives on Psychological Science: A Journal of the Association for Psychological Science 7(6). 645–654.
Ioannidis, John P. A., Kevin W. Boyack, Henry Small, Aaron A. Sorensen & Richard Klavans. 2014. Bibliometrics: Is your most cited work your best? Nature 514(7524). 561–562.
Klein, Richard A., Kate A. Ratliff, Michelangelo Vianello, Reginald B. Adams, Štěpán Bahník, Michael J. Bernstein, Konrad Bocian, et al. 2014. Investigating variation in replicability. Social Psychology 45(3). 142–152.
Kramer, Adam D. I., Jamie E. Guillory & Jeffrey T. Hancock. 2014. Experimental evidence of massive-scale emotional contagion through social networks. Proceedings of the National Academy of Sciences 111(24). 8788–8790.
Kriegeskorte, Nikolaus. 2012. An emerging consensus for open evaluation: 18 visions for the future of scientific publishing. 1–5.
Kriegeskorte, Nikolaus, Marieke Mur & Peter Bandettini. 2008. Representational similarity analysis – connecting the branches of systems neuroscience. Frontiers in Systems Neuroscience 2
Lazer, David, Ryan Kennedy, Gary King & Alessandro Vespignani. 2014. Big data. The parable of Google Flu: Traps in big data analysis. Science 343(6176). 1203–1205.
Lindley, Dennis V. 2000. The philosophy of statistics. Journal of the Royal Statistical Society: Series D (The Statistician) 49(3). 293–337
Loftus, Geoffrey R. 1993. A picture is worth a thousand p values: On the irrelevance of hypothesis testing in the microcomputer age. Behavior Research Methods, Instruments, & Computers 25(2). 250–256.
Maxwell, Scott E. 2004. The persistence of underpowered studies in psychological research: Causes, consequences, and remedies. Psychological Methods 9(2). 147–163.
Meehl, Paul E. 1978. Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology.
Meehl, PE. 1967. Theory-testing in psychology and physics: A methodological paradox. Philosophy of science 34(2). 103–115.
Miller, G. 2012. The smartphone psychology manifesto. Perspectives on Psychological Science: A Journal of the Association for Psychological Science 7(3). 221–237.
Nickerson, R. S. 2000. Null hypothesis significance testing: A review of an old and continuing controversy. Psychological methods 5(2). 241–301.
Nuijten, Michèle B., Chris H. J. Hartgerink, Marcel A. L. M. van Assen, Sacha Epskamp & Jelte M. Wicherts. 2015. The prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods. 1–22.
Open Science Collaboration. 2015. Estimating the reproducibility of psychological science. Science 349(6251). aac4716–aac4716.
Pfeiffer, Thomas, Lars Bertram & John P. A. Ioannidis. 2011. Quantifying selective reporting and the Proteus phenomenon for multiple datasets with similar bias. PLoS ONE 6(3). e18362.
Rodriguez-Esteban, Raul, Ivan Iossifov & Andrey Rzhetsky. 2006. Imitating manual curation of text-mined facts in biomedicine. PLoS computational biology 2(9). e118.
Rosenthal, Robert. 1979. The file drawer problem and tolerance for null results. Psychological Bulletin 86(3). 638–641.
Rosenthal, Robert & John Gaito. 1963. The interpretation of levels of significance by psychological researchers. The Journal of Psychology 55(1). 33–38.
Rutledge, Robb B., Nikolina Skandali, Peter Dayan & Raymond J. Dolan. 2014. A computational and neural model of momentary subjective well-being. Proceedings of the National Academy of Sciences 111(33). 12252–12257.
Simmons, Joseph P., Leif D. Nelson & Uri Simonsohn. 2011. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science 22(11). 1359–1366.
Simonsohn, Uri, Leif D. Nelson & Joseph P. Simmons. 2014. P-curve: A key to the file-drawer. Journal of Experimental Psychology: General 143(2). 534.
Thompson, William C. & Edward L. Schumann. 1987. Interpretation of statistical evidence in criminal trials: The prosecutor’s fallacy and the defense attorney’s fallacy. Law and Human Behavior 11(3). Springer. 167–187.
Tversky, Amos & Daniel Kahneman. 1971. Belief in the law of small numbers. Psychological Bulletin 76(2). 105.
Tyler, Lorraine K, Teresa P. L. Cheung, Barry J. Devereux & Alex Clarke. 2013. Syntactic computations in the language network: Characterizing dynamic network properties using representational similarity analysis. Frontiers in Psychology 4. 271.
Wicherts, Jelte M. & Marcel A. L. M. van Assen. 2012. Research fraud: Speed up reviews of misconduct. Nature 488(7413). 591–591.
