Article published In: Advanced Quantitative Methods in Bi-/Multilingualism
Edited by Christos Pliatsikas, George Pontikas and Ian Cunnings
[Linguistic Approaches to Bilingualism 15:4] 2025
► pp. 487–517
Optimising participant grouping methods in bilingualism studies
Insights from eye-tracking data
Available under the Creative Commons Attribution (CC BY) 4.0 license.
For any use beyond this license, please contact the publisher at rights@benjamins.nl.
Open Access publication of this article was funded through a Transformative Agreement with University of Birmingham.
Published online: 28 April 2025
https://doi.org/10.1075/lab.24019.he
https://doi.org/10.1075/lab.24019.he
Abstract
This research addresses two major challenges in studying second language acquisition and bilingualism: reducing
overlap in predictor variables and correctly classifying participants into language proficiency levels. Too many relevant
predictors can harm statistical analysis due to an increased chance of overlap, known as multicollinearity. To tackle this, we use
Principal Component Analysis (PCA) on selected predictors to identify proficiency indicators, combining the length of stay in the
UK and language test scores. Additionally, traditional methods, especially IELTS-based proficiency classifications, often miss
subtle differences in language skills, particularly when they fail to consider how long participants have been exposed to the
target language. We counter this by using non-hierarchical Cluster Analysis (NCA) for a grounded, data-driven way of detecting
distinct language proficiency groups. This new approach is demonstrated on a dataset of eye movements from reading tasks,
collected from Chinese–English bilinguals in the UK.
Article outline
- 1.Introduction
- 1.1Critical predictors in bilingual reading and the problem of multicollinearity
- 1.2L2 reading models and developmental trajectories
- 1.3Participant grouping methods in bilingualism research
- 1.4This study
- 2.Methods
- 2.1Participants
- 2.2Language history questionnaires and vocabulary test
- 2.3Reading stimuli
- 2.4Apparatus
- 2.5Procedure
- 2.6Eye-Movement Variables
- 2.7Data preparation
- Random factors
- Controls
- Covariates
- 3.Data analysis and results
- 3.1Clustering Chinese participants
- Correlation analysis
- Principal component analysis
- 3.2Statistical modelling of first fixation durations
- 3.1Clustering Chinese participants
- 4.Discussion
- 5.Conclusions
- Acknowledgements
- Declaration of competing interest
- Data availability statement
- Notes
References
References (53)
Aizawa, I., Rose, H., Thompson, G., & Curle, S. (2020). Beyond
the threshold: Exploring English language proficiency, linguistic challenges, and academic language skills of Japanese
students in an English medium instruction programme. Language Teaching
Research, 27(4), 837–861.
Ashby, J., Rayner, K., & Clifton, C. (2005). Eye
movements of highly skilled and average readers: Differential effects of frequency and
predictability. The Quarterly Journal of Experimental Psychology Section A,
58(61), 1065–1086.
Baayen, R. H., & Milin, P. (2010). Analyzing
reaction times. International Journal of Psychological
Research, 3(2), 12–28.
Baayen, R. H., Milin, P., & Ramscar, M. (2016). Frequency
in lexical
processing. Aphasiology, 30(11), 1174–1220.
Barton, J. J., Hanif, H. M., Eklinder Björnström, L., & Hills, C. (2014). The
word-length effect in reading: A review. Cognitive
Neuropsychology, 31(5–6), 378–412.
Bialystok, E. (2011). Coordination
of executive functions in monolingual and bilingual children. Journal of Experimental Child
Psychology, 110(3), 461–468.
BNC Consortium. (2007). The British
National Corpus, XML Edition [Data set]. Oxford Text Archive. [URL] (Oxford Text Archive)
Brown, J. I., Fishco, V. V., & Hanna, G. S. (1993). Nelson-Denny
reading test: Technical report, forms G & H. Riverside Publishing Company.
Cattell, R. B. (1966). The
scree test for the number of factors. Multivariate Behavioral
Research, 1(2), 245–276.
Chaouch-Orozco, A., Alonso, J. G., & Rothman, J. (2021). Individual
differences in bilingual word recognition: The role of experiential factors and word frequency in cross-language lexical
priming. Applied
Psycholinguistics, 42(2), 447–474.
Cop, U., Drieghe, D., & Duyck, W. (2015). Eye
movement patterns in natural reading: A comparison of monolingual and bilingual reading of a
novel. PLOS
One, 10(8), Article e0134008.
Coretta, S., & Casillas, J. V. (2024). A
tutorial on generalised additive mixed effects models for bilingualism research. Linguistic
Approaches to Bilingualism, this issue. Advance online
publication.
Divjak, D., & Milin, P. (2020). Exploring
and exploiting uncertainty: Statistical learning ability affects how we learn to process language along multiple dimensions of
experience. Cognitive
Science, 44(5), Article e12835.
Dussias, P. E. (2010). Uses
of eye-tracking data in second language sentence processing research. Annual Review of Applied
Linguistics, 301, 149–166.
Dussias, P. E., & Sagarra, N. (2007). The
effect of exposure on syntactic parsing in Spanish–English bilinguals. Bilingualism: Language
and
Cognition, 10(1), 101–116.
Findelsberger, E., Hutzler, F., & Hawelka, S. (2019). Spill
the load: Mixed evidence for a foveal load effect, reliable evidence for a spillover effect in eye-movement control during
reading. Attention, Perception, &
Psychophysics, 811, 1442–1453.
Wickham, H., François, R., Henry, L., Müller, K., & Vaughan, D. (2023). dplyr:
A grammar of data manipulation (R package version 1.1.4) [Computer
software]. [URL]9
He, S. (2024). Learning to optimise reading: an exploratory study of eye movement transfer across languages [Doctoral dissertation, University of Birmingham]. University of Birmingham eTheses Repository. [URL]
He, S., Divjak, D., & Milin, P. (2022, June 6–8). Examining
the role of L2 proficiency in eye-movement patterns [Conference
session]. University of Edinburgh Linguistics and English Language Postgraduate
Conference, Edinburgh, United Kingdom.
Heister, J., Würzner, K. M., & Kliegl, R. (2012). Analysing
large datasets of eye movements during reading. Visual Word
Recognition, 21, 102–130. Psychology Press.
Hersch, J., & Andrews, S. (2012). Lexical
quality and reading skill: Bottom-up and top-down contributions to sentence
processing. Scientific Studies of
Reading, 161, 240–262.
Hofweber, J., Marinis, T., & Treffers-Daller, J. (2016). Effects
of dense code-switching on executive control. Linguistic Approaches to
Bilingualism, 6(5), 648–668.
Hulstijn, J. H. (2012). The
construct of language proficiency in the study of bilingualism from a cognitive
perspective. Bilingualism: Language and
Cognition, 15(2), 422–433.
Jia, G., Aaronson, D., & Wu, Y. (2002). Long-term
language attainment of bilingual immigrants: Predictive variables and language group
differences. Applied
Psycholinguistics, 23(4), 599–621.
Just, M. A., & Carpenter, P. A. (1980). A
theory of reading: From eye fixations to comprehension. Psychological
Review, 87(4), 329–353.
Kaiser, H. F. (1960). The
application of electronic computers to factor analysis. Educational and Psychological
Measurement, 20(1), 141–151.
Keating, G. D. (2024). Normalization
of timed measures in bilingualism research: Make it optimal with the Box-Cox
transformation. Linguistic Approaches to Bilingualism. Advance online
publication.
Kliegl, R., Nuthmann, A., & Engbert, R. (2006). Tracking
the mind during reading: the influence of past, present, and future words on fixation
durations. Journal of Experimental Psychology: General,
135(1), 12–35.
Kroll, J. F., & Bialystok, E. (2013). Understanding
the consequences of bilingualism for language processing and cognition. Journal of Cognitive
Psychology, 25(5), 497–514.
Kroll, J. F., Dussias, P. E., Bogulski, C. A., & Kroff, J. R. V. (2012). Juggling
two languages in one mind: What bilinguals tell us about language processing and its consequences for
cognition. In Psychology of Learning and
Motivation (Vol. 561, pp. 229–262). Academic press.
Lantolf, J. P., & Poehner, M. E. (2011). Dynamic assessment in the classroom: Vygotskian praxis for second language development. Language teaching research, 15(1), 11–33.
Li,
P., Sepanski, S., & Zhao, X. (2006). Language
history questionnaire: A web-based interface for bilingual research. Behavior Research
Methods, 38(2), 202–210.
Linck, J. A., Kroll, J. F., & Sunderman, G. (2009). Losing
access to the native language while immersed in a second language: Evidence for the role of inhibition in second-language
learning. Psychological
Science, 20(12), 1507–1515.
Marian, V., & Fausey, C. M. (2006). Language‐dependent
memory in bilingual learning. Applied Cognitive Psychology: The Official Journal of the Society
for Applied Research in Memory and
Cognition, 20(8), 1025–1047.
Nilsson, M. (2012). Computational
Models of Eye Movements in Reading: A Data-driven Approach to the Eye-mind Link [Doctoral
dissertation, Acta Universitatis Upsaliensis]. Retrieved from [URL]
Park, H. I., & Ziegler, N. (2014). Cognitive
shift in the bilingual mind: Spatial concepts in Korean–English bilinguals. Bilingualism:
Language and
Cognition, 17(2), 410–430.
Poehner, M. (2008). Dynamic
assessment: A Vygotskian approach to understanding and promoting L2
development (Vol. 91). Springer Science & Business Media.
R Core Team. (2023). R: A language and
environment for statistical computing (Version 4.3.2) [Computer
software]. [URL]
Ramanujan, K., & Weekes, B. S. (2020). Predictors
of lexical retrieval in Hindi–English bilingual speakers. Bilingualism: Language and
Cognition, 23(2), 265–273.
Rayner, K. (1998). Eye
movements in reading and information processing: 20 years of research. Psychological Bulletin,
124(3), 372–422.
Rayner, K., & Liversedge, S. P. (2011). Linguistic
and cognitive influences on eye movements during reading. In S. P. Liversedge, I. D. Gilchrist, & S. Everling (Eds.), The
Oxford Handbook of Eye
Movements (pp. 751–766). Oxford University Press.
Schotter, E. R., Angele, B., & Rayner, K. (2012). Parafoveal
processing in reading. Attention, Perception, &
Psychophysics, 74(1), 5–35.
Schotter, E. R., Reichle, E. D., & Rayner, K. (2014). Rethinking
parafoveal processing in reading: Serial-attention models can explain semantic preview benefit and N+ 2 preview
effects. Visual
Cognition, 22(3–4), 309–333.
Schroeder, S. R., Lam, T. Q., & Marian, V. (2017). Linguistic
predictors of cultural identification in bilinguals. Applied
Linguistics, 38(4), 463–488.
Shabani, K. (2023). Diagnostic
and developmental potentials of computerized dynamic assessment (C-DA) for L2
vocabulary. Interdisciplinary Studies in English Language
Teaching, 1(2), 130–149.
Tempelaar, D., Rienties, B., & Nguyen, Q. (2020). Subjective
data, objective data and the role of bias in predictive modelling: Lessons from a dispositional learning analytics
application. PLOS
One, 15(6), Article e0233977.
Tomoschuk, B., Ferreira, V. S., & Gollan, T. H. (2019). When
a seven is not a seven: Self-ratings of bilingual language proficiency differ between and within language
populations. Bilingualism: Language and
Cognition, 22(3), 516–536.
Whitford, V., & Joanisse, M. F. (2021). Eye
movement measures of within-language and cross-language activation during reading in monolingual and bilingual children and
adults: A focus on neighborhood density effects. Frontiers in
Psychology, 121, Article 674007.
Whitford, V., & Titone, D. (2012). Second-language
experience modulates first-and second-language word frequency effects: Evidence from eye movement measures of natural
paragraph reading. Psychonomic Bulletin &
Review, 19(1), 73–80.
Wurm, L. H., & Fisicaro, S. A. (2014). What
residualizing predictors in regression analyses does (and what it does not do). Journal of
Memory and
Language, 721, 37–48.
Cited by (1)
Cited by one other publication
Pliatsikas, Christos, George Pontikas & Ian Cunnings
2025. Applying advanced quantitative methods in bi-/multilingualism. Linguistic Approaches to Bilingualism 15:4 ► pp. 425 ff.
This list is based on CrossRef data as of 26 november 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
