An item-based, Rasch-calibrated approach to assessing translation quality

Han, Chao; Shang, Xiaoqi

doi:10.1075/target.20052.han

Article published In: Target
Vol. 35:1 (2023) ► pp.63–96

Get fulltext from our e-platform

Download PDF

Download EPUB

An item-based, Rasch-calibrated approach to assessing translation quality

Chao Han | Xiamen University

Xiaoqi Shang | Shenzhen University

Published online: 15 September 2022

https://doi.org/10.1075/target.20052.han

Abstract

Item-based scoring has been advocated as a psychometrically robust approach to translation quality assessment, outperforming traditional neo-hermeneutic and error analysis methods. The past decade has witnessed a succession of item-based scoring methods being developed and trialed, ranging from calibration of dichotomous items to preselected item evaluation. Despite this progress, these methods seem to be undermined by several limitations, such as the inability to accommodate the multifaceted reality of translation quality assessment and inconsistent item calibration procedures. Against this background, we conducted a methodological exploration, utilizing what we call an item-based, Rasch-calibrated method, to measure translation quality. This new method, built on the sophisticated psychometric model of many-facet Rasch measurement, inherits the item concept from its predecessors, but addresses previous limitations. In this article, we demonstrate its operationalization and provide an initial body of empirical evidence supporting its reliability, validity, and utility, as well as discuss its potential applications.

Keywords: translation quality assessment, many-facet Rasch measurement, item-based assessment, item calibration

Article outline

1.Introduction
2.Literature review
- 2.1The developmental trajectory of item-based assessment
- 2.2Critical evaluation of CDI and PIE
- 2.3Item-based, Rasch-calibrated evaluation
3.Method
- 3.1Samples of English-to-Chinese translation
- 3.2Annotators and raters
- 3.3Item selection
- 3.4Item scoring
- 3.5Item calibration
- 3.6Post-hoc questionnaire and interview
- 3.7Data analysis
4.Results
- 4.1Reliability evidence
- 4.2Validity evidence
- 4.3Perceived utility
5.Discussion
6.Conclusion
Acknowledgements
Notes
References

References (37)

References

Angelelli, Claudia V. 2009. “Using a Rubric to Assess Translation Ability: Defining the Construct.” In Testing and Assessment in Translation and Interpreting Studies: A Call for Dialogue between Research and Practice, edited by Claudia V. Angelelli and Holly E. Jacobson, 13–47. Amsterdam: John Benjamins.

Bachman, Lyle F. 2004. Statistical Analyses for Language Assessment. Cambridge: Cambridge University Press.

Bond, Trevor G., and Christine M. Fox. 2015. Applying the Rasch Model: Fundamental Measurement in the Human Sciences. 3rd ed. New York: Routledge.

Campbell, Stuart J. 1991. “Towards a Model of Translation Competence.” Meta 36 (2–3): 329–343.

Colina, Sonia. 2008. “Translation Quality Evaluation: Some Empirical Evidence for a Functionalist Approach.” The Translator 14 (1): 97–134.

. 2009. “Further Evidence for a Functionalist Approach to Translation Quality Evaluation.” Target 21 (2): 235–264.

Eckes, Thomas. 2015. Introduction to Many-Facet Rasch Measurement: Analyzing and Evaluating Rater-Mediated Assessments. 2nd ed. Frankfurt am Main: Peter Lang.

Eyckmans, June, and Philippe Anckaert. 2017. “Item-based Assessment of Translation Competence: Chimera of Objectivity Versus Prospect of Reliable Measurement.” In Translator Quality – Translation Quality: Empirical Approaches to Assessment and Evaluation, edited by Geoffrey S. Koby and Isabel Lacruz, special issue of Linguistica Antverpiensia 161: 40–56.

Eyckmans, June, Philippe Anckaert, and Winibert Segers. 2009. “The Perks of Norm Referenced Translation Evaluation.” In Testing and Assessment in Translation and Interpreting Studies: A Call for Dialogue between Research and Practice, edited by Claudia V. Angelelli and Holly E. Jacobson, 73–93. Amsterdam: John Benjamins.

Eyckmans, June, Winibert Segers, and Philippe Anckaert. 2012. “Translation Assessment Methodology and the Prospects of European Collaboration.” In Collaboration in Language Testing and Assessment, edited by Dina Tsagari and Ildikó Csépes, 171–184. Frankfurt am Main: Peter Lang.

Green, Rita. 2013. Statistical Analyses for Language Testers. Basingstoke: Palgrave Macmillan.

Han, Chao. 2015. “Investigating Rater Severity/Leniency in Interpreter Performance Testing: A Multifaceted Rasch Measurement Approach.” Interpreting 17 (2): 255–283.

. 2016. “Investigating Score Dependability in English/Chinese Interpreter Certification Performance Testing: A Generalizability Theory Approach.” Language Assessment Quarterly 13 (3): 186–201.

. 2017. “Using Analytic Rating Scales to Assess English/Chinese Bi-directional Interpretation: A Longitudinal Rasch Analysis of Scale Utility and Rater Behavior.” In Translator Quality – Translation Quality: Empirical Approaches to Assessment and Evaluation, edited by Geoffrey S. Koby and Isabel Lacruz, special issue of Linguistica Antverpiensia 161: 196–215.

. 2019. “A Generalizability Theory Study of Optimal Measurement Design for a Summative Assessment of English/Chinese Consecutive Interpreting.” Language Testing 36 (3): 419–438.

. 2020. “Translation Quality Assessment: A Critical Methodological Review.” The Translator 26 (3): 257–273.

Han, Chao, Rui Xiao, and Wei Su. 2021. “Assessing the Fidelity of Consecutive Interpreting: The Effects of Using Source Versus Target Text as the Reference Material.” Interpreting 23 (2): 245–268.

House, Juliane. 2015. Translation Quality Assessment: Past and Present. Abingdon: Routledge.

IBM Corp. 2012. IBM SPSS Statistics for Windows. V. 21.0. Armonk, NY: IBM Corp.

Kockaert, Hendrik J., and Winibert Segers. 2012. “L’assurance qualité des traductions: items sélectionnés et évaluation assistée par ordinateur [Quality assurance of translations: Selected items and computer-assisted evaluation].” Meta 57 (1): 159–176.

. 2017. “Evaluation of Legal Translations: PIE Method (Preselected Items Evaluation).” JoSTrans 271: 148–163.

Lauscher, Susanne. 2000. “Translation Quality Assessment: Where Can Theory and Practice Meet?” The Translator 6 (2): 149–168.

Linacre, John M. 1989. Many-Facet Rasch Measurement. Chicago: MESA Press.

1999. “Investigating Rating Scale Category Utility.” Journal of Outcome Measurement 3 (2): 103–122.

2002. “What Do Infit and Outfit, Mean-Square and Standardized Mean?” Rasch Measurement Transactions 16 (2): 878.

2017. FACETS: Computer Program for Many Faceted Rasch Measurement. V. 3.80.0. Beaverton, OR: Winsteps.

Martínez Mateo, Robert. 2014. “A Deeper Look into Metrics for Translation Quality Assessment (TQA): A Case Study.” Miscelanea 491: 73–93.

McAlester, Gerard. 2000. “The Evaluation of Translation into a Foreign Language.” In Developing Translation Competence, edited by Christina Schäffner and Beverly Adab, 229–241. Amsterdam: John Benjamins.

Myford, Carol M., and Edward W. Wolfe. 2003. “Detecting and Measuring Rater Effects Using Many-Facet Rasch Measurement: Part I.” Journal of Applied Measurement 4 (4): 386–422.

O’Brien, Sharon. 2012. “Towards a Dynamic Quality Evaluation Model for Translation.” JoSTrans 171: 55–77.

Pym, Anthony. 1992. “Translation Error Analysis and the Interface with Language Teaching.” In Teaching Translation and Interpreting: Training, Talent and Experience. Papers from the First Language International Conference, Elsinore, Denmark, 1991, edited by Cay Dollerup and Anne Loddegaard, 279–288. Amsterdam: John Benjamins.

Rasch, Georg. 1980. Probabilistic Models for Some Intelligence and Attainment Tests. Chicago: MESA Press.

Teague, Ben. 1987. “ATA Accreditation and Excellence in Practice.” In Translation Excellence: Assessment, Achievement, Maintenance, edited by Marilyn Gaddis Rose, 21–26. Amsterdam: John Benjamins.

Turner, Barry, Miranda Lai, and Neng Huang. 2010. “Error Deduction and Descriptors – A Comparison of Two Methods of Translation Test Assessment.” Translation & Interpreting 2 (1): 11–23.

Waddington, Christopher. 2001. “Should Translations Be Assessed Holistically or Through Error Analysis?” Hermes 261: 15–38.

Williams, Malcolm. 1989. “The Assessment of Professional Translation Quality: Creating Credibility out of Chaos.” TTR 2 (2): 13–33.

Wind, Stefanie A., and Meghan E. Peterson. 2018. “A Systematic Review of Methods for Evaluating Rating Quality in Language Assessment.” Language Testing 35 (2): 161–192.

Cited by (6)

Cited by six other publications

Order by:

Aldosari, Lama Abdullah & Nasrin Altuwairesh

2025. Assessing the effects of translation prompts on the translation quality of GPT-4 Turbo using automated and human evaluation metrics: a case study. Perspectives ► pp. 1 ff.

Gong, Min

2025. The neural network algorithm-based quality assessment method for university English translation. Network: Computation in Neural Systems 36:3 ► pp. 649 ff.

Han, Chao, Shirong Chen & Jia Feng

2025. Modeling rater cognition in translation assessment. Target. International Journal of Translation Studies 37:4 ► pp. 590 ff.

Huang, Yujie, Andrew K F Cheung, Kanglong Liu & Han Xu

2025. Can sentiment analysis help to assess accuracy in interpreting? A corpus-assisted computational linguistic approach. Applied Linguistics

Li, Xiaodong, Jie Yuan & Ying Xu

2025. Investigating the Decision-making Style of Translation Raters in Large-scale Language Tests. SAGE Open 15:3

Han, Chao, Xiaolei Lu & Peixin Zhang

2023. Use of statistical methods in translation and interpreting research. Target. International Journal of Translation Studies 35:4 ► pp. 483 ff.

This list is based on CrossRef data as of 4 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.