Article published In: Advanced Quantitative Methods in Bi-/Multilingualism
Edited by Christos Pliatsikas, George Pontikas and Ian Cunnings
[Linguistic Approaches to Bilingualism 15:4] 2025
► pp. 518–537
Normalization of timed measures in bilingualism research
Make it optimal with the Box-Cox transformation
Published online: 25 September 2024
https://doi.org/10.1075/lab.24017.kea
https://doi.org/10.1075/lab.24017.kea
Abstract
The time it takes an individual to respond to a probe (e.g., a word, picture, or question) or to read a word or
phrase provides useful insights into cognitive processes. Consequently, timed measures are a staple in bilingualism research.
However, timed measures usually violate assumptions of linear models, one being normal distribution of the residuals. Power
transformations are a common solution but which of the many possible transformations to apply is often guesswork. Box, G. E. P., & Cox, D. R. (1964). An
analysis of transformations. Journal of the Royal Statistical Society. Series B
(Methodological), 261, 211–252. developed a procedure to estimate the best-fitting normalizing
transformation, coefficient lambda (λ), that is easy to run using standard R packages. This practical primer demonstrates how to
perform the Box-Cox transformation in R using as a testbed the distractor items from a recent eye-tracking study on sentence
reading in speakers of Spanish as a majority and a heritage language. The analyses show (a) that the exponents selected via the
Box-Cox procedure reduce positive skewness as well as or better than the natural log; (b) that the best-fitting value of λ varies
based on factors such as group and, in the case of eye-movement data, the measure of interest; and (c) that the choice of
transformation sometimes impacts p values for model estimates.
Keywords: Box-Cox transformation, reading times, response times, outliers, skewness
Article outline
- 1.Introduction
- 1.1Data normalization
- 1.2The Box-Cox transformation
- 2.Performing the Box-Cox transformation in R
- 2.1The sample data
- 2.2R packages and code
- 3.Distributions before and after transformation
- 4.Model comparisons
- 5.Conclusion
- Data availability
- Acknowledgements
References
References (18)
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random
effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and
Language, 68(3), 255–278.
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting
linear mixed-effects models using lme4. Journal of Statistical
Software, 67(1), 1–48.
Box, G. E. P., & Cox, D. R. (1964). An
analysis of transformations. Journal of the Royal Statistical Society. Series B
(Methodological), 261, 211–252.
Burchill, Z. J., & Jaeger, T. F. (2024). How
reliable are standard reading time analyses? Hierarchical bootstrap reveals substantial power over-optimism and
scale-dependent Type I error inflation. Journal of Memory and
Language, 1361, Article 104494.
Cuetos, F., Glez-Nosti, M., Barbón, A., & Brysbaert, M. (2011). SUBTLEX-ESP:
Spanish word frequencies based on film
subtitles. Psicológica, 32(2), 133–143. [URL]
Drummer, J.-D., & Felser, C. (2018). Cataphoric
pronoun resolution in native and non-native sentence comprehension. Journal of Memory and
Language, 1011, 97–113.
Keating, G. D. (2022). The
effect of age of onset of bilingualism on gender agreement processing in Spanish as a heritage
language. Language
Learning, 72(4), 1170–1208.
(2024). Morphological
markedness and the temporal dynamics of gender agreement processing in Spanish as a majority and a heritage
language. Language Learning. Advance online
publication.
Nicklin, C., & Plonsky, L. (2020). Outliers
in L2 research in applied linguistics: A synthesis and data re-analysis. Annual Review of
Applied
Linguistics, 401, 26–55.
Osborne, J. (2010). Improving
your data transformations: Applying the Box-Cox transformation. Practical Assessment, Research,
and Evaluation, 151, Article 12. [URL]
R Core Team (2023). R: A language and
environment for statistical computing (Version 4.3.1) [Computer
software]. R Foundation for Statistical Computing. Retrieved
from [URL]
Ratcliff, R. (1993). Methods
for dealing with reaction time outliers. Psychological
Bulletin, 114(3), 510–532.
Rayner, K. (1998). Eye
movements in reading and information processing: 20 years of research. Psychological
Bulletin, 124(3), 372–422.
Revelle, W. (2024). Psych:
Procedures for psychological, psychometric, and personality research. Northwestern University, Evanston, Illinois.
SR Research (2005). EyeLink
1000 [Apparatus and software]. [URL]
Cited by (2)
Cited by two other publications
He, Shiyu, Dagmar Divjak & Petar Milin
2025. Optimising participant grouping methods in bilingualism studies. Linguistic Approaches to Bilingualism 15:4 ► pp. 487 ff.
Pliatsikas, Christos, George Pontikas & Ian Cunnings
2025. Applying advanced quantitative methods in bi-/multilingualism. Linguistic Approaches to Bilingualism 15:4 ► pp. 425 ff.
This list is based on CrossRef data as of 24 november 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
