In:Meaningful Language Test Scores: Research to enhance score interpretation
Edited by Spiros Papageorgiou and Venessa F. Manna
[Innovations in Language Learning and Assessment 1] 2023
► pp. 80–98
Chapter 5Scale anchoring methodology for developing revised performance
level descriptors for the TOEFL iBT® test
Published online: 29 June 2023
https://doi.org/10.1075/illa.1.05wan
https://doi.org/10.1075/illa.1.05wan
Abstract
A scale anchoring approach (Beaton & Allen, 1992) was initially used to
develop empirically-based listening and reading performance descriptors
for the TOEFL iBT® test, to help interpret scores. To further
facilitate score interpretation, scores were also mapped to the Common
European Framework of Reference (CEFR) levels through a standard setting
study (Tannenbaum & Wylie,
2008), with revision to the score mapping results as
additional data became available and feedback from score users was
received (Papageorgiou et al.,
2015). Although the two approaches, score mapping and scale
anchoring, helped score interpretation from distinct perspectives, they
also created some confusion for score users in that the performance
levels from the two approaches were not fully aligned. The current study
adopted a unified approach (Powers
et al., 2017) to revise the performance descriptors of the
scale anchoring approach by defining the proficiency levels based on the
most recent CEFR score mapping results (Papageorgiou et al., 2015). The performance
descriptors evolving from this unified approach aimed to help us to
address score users’ needs for score interpretation by referencing the
widely known CEFR proficiency levels and by developing performance
descriptors that were meaningful and relevant to the TOEFL iBT test.
Article outline
- Introduction
- Method
- Data
- Procedure
- Determining score proficiency levels
- Computing item p-values for the proficiency levels
- Identifying candidate anchor items at each proficiency level
- Developing descriptors from content characteristics of the anchor items
- Results
- Conclusion
References Appendix
References (15)
American Educational
Research Association, American Psychological Association,
& National Council on Measurement in Education. (2014). Standards
for educational and psychological
testing. American Educational Research Association.
Beaton, A., & Allen, N. (1992). Interpreting
scales through scale
anchoring. Journal of
Educational
Statistics, 17(2), 191–204.
Council of
Europe. (2001). Common
European Framework of Reference for Languages: Learning,
teaching,
assessment. Cambridge University Press.
. (2009). Relating
language examinations to the Common European Framework of
Reference for Languages: Learning, teaching, assessment
(CEFR). A Manual. Retrieved on 8
February 2023 from [URL]
. (2020). Common
European Framework of Reference for Languages: Learning,
teaching, assessment. Companion volume with new
descriptors. Retrieved on 8
February 2023 from [URL]
ETS. (2020). TOEFL
Research Insight Series Volume 3: Reliability and
comparability of TOEFL iBT
scores. Retrieved on 8 February
2023 from [URL]
Garcia Gomez, P., Noah, A., Schedl, M., Wright, C., & Yolkut, A. (2007). Proficiency
descriptors based on a scale-anchoring study of the new
TOEFL iBT reading
test. Language
Testing, 24(3), 417–435.
Haberman, S. J., Sinharay, S., & Lee, Y.-H. (2011). Statistical
procedures to evaluate quality of scale
anchoring (ETS Research Report
RR–11–02). ETS.
Hambleton, R. K., & Zenisky, A. L. (2013). Reporting
test scores in more meaningful ways: Some new findings,
research methods, and guidelines for score report
design. In K. F. Geisinger (Ed.), APA
handbook of testing and assessment in
psychology (pp. 479–494). American Psychological Association.
Papageorgiou, S., Tannenbaum, R. J., Bridgeman, B., & Cho, Y. (2015). The
association between TOEFL iBT® test scores and
the Common European Framework of Reference (CEFR)
levels (ETS Research Memorandum
RM–15–06). ETS. Retrieved
on 8 February 2023 from [URL]
Powers, D., Schedl, M., & Papageorgiou, S. (2017). Facilitating
the interpretation of English language proficiency scores:
Combining scale anchoring and test score mapping
methodologies. Language
Testing, 34, 175–195.
Ryan, J. (2006). Practices,
issues, and trends in student test score
reporting. In S. Downing & T. Haladyna (Eds.), Handbook
of test
development (pp. 677–710). Lawrence Erlbaum Associates.
