Article published In: Interpreting
Vol. 23:2 (2021) ► pp.245–268
Assessing the fidelity of consecutive interpreting
The effects of using source versus target text as the reference material
Published online: 5 February 2021
https://doi.org/10.1075/intp.00058.han
https://doi.org/10.1075/intp.00058.han
Abstract
The study reported on in this article pertains to rater-mediated assessment of English-to-Chinese consecutive interpreting, particularly informational correspondence between an originally intended message and an actually rendered message, also known as “fidelity” in Interpreting Studies. Previous literature has documented two main methods to assess fidelity: comparing actual renditions with the source text or with an exemplar rendition carefully prepared by experts (i.e., an ideal target text). However, little is known about the potential effects of these methods on fidelity assessment. We therefore conducted the study to explore the way in which these methods would affect rater reliability, fidelity ratings and rater perception. Our analysis of quantitative data shows that the raters tended to be less reliable, less self-consistent, less lenient and less comfortable when using the source English text (i.e., Condition A) than when using the target Chinese text (i.e., Condition B: the exemplar rendition). These findings were backed up and explained by emerging themes derived from the qualitative questionnaire data. The fidelity estimates in the two conditions were also found to be strongly correlated. We discuss these findings and entertain the possibility of recruiting untrained monolinguals or bilinguals to assess fidelity of interpreting.
Article outline
- Introduction
- Interpreting assessment: An overview of practice and research
- Assessment of fidelity in interpreting
- Source versus target texts as the reference material
- Research questions
- Method
- Data source
- Raters
- Experimental design
- Rater training
- Rating procedure
- Post-hoc questionnaire
- Data analysis
- Results
- Effects on rater reliability
- Effects on the fidelity ratings
- Raters’ perceptions of the reference materials
- Discussion
- Conclusion
- Acknowledgment
References
References (43)
Angelelli, C. & Jacobson, H. E. (Eds.) (2009). Testing and assessment in translation and interpreting studies. Amsterdam: John Benjamins.
Barik, H. C. (1975). Simultaneous interpretation: Temporal and quantitative data. Language and Speech 16 (3), 237–270.
Bühler, H. (1986). Linguistic (semantic) and extra-linguistic (pragmatic) criteria for the evaluation of conference interpretation and interpreters. Mutlilingua 5 (4), 231–235.
Campbell, S. & Hale, S. (2003). Translation and interpreting assessment in the context of educational measurement. In G. Anderman & M. Rogers (Eds.), Translation today: Trends and perspectives. Clevedon: Multilingual Matters, 205–224.
Carroll, J. B. (1966). An experiment in evaluating the quality of translations. Mechanical Translation and Computational Linguistics 9 (3–4), 55–66.
Chesterman, A. (2016). Memes of translation: The spread of ideas in translation theory (revised edition). Amsterdam: John Benjamins.
Coughlin, D. (2003). Correlating automated and human assessments of machine translation quality. Retrieved from <[URL]>
Eckes, T. (2015). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments. Frankfurt am Main: Peter Lang.
Gerver, D. (1969/2002). The effects of source language presentation rate on the performance of simultaneous conference interpreters. In F. Pöchhacker & M. Shlesinger (Eds.), The interpreting studies reader. London: Routledge, 53–66.
Gile, D. (1995). Fidelity assessment in consecutive interpretation: An experiment. Target 7 (1), 151–164.
(1999). Variability in the perception of fidelity in simultaneous interpretation. Hermes 221, 51–79.
(2009). Interpreting studies: A critical view from within. MonTI 11, 135–155. [URL].
Hamidi, M. & Pöchhacker, F. (2007). Simultaneous consecutive interpreting: A new technique put to the test. Meta 52 (2), 276–289.
Han, C. (2015). Investigating rater severity/leniency in interpreter performance testing: A multifaceted Rasch measurement approach. Interpreting 17 (2), 255–283.
(2016). Investigating score dependability in English/Chinese interpreter certification performance testing: A generalizability theory approach. Language Assessment Quarterly 13 (3), 186–201.
(2017). Using analytic rating scales to assess English/Chinese bidirectional interpretation: A longitudinal Rasch analysis of scale utility and rater behavior. Linguistica Antverpiensia New Series – Themes in Translation Studies 161, 196–215.
(2018a). Using rating scales to assess interpretation: Practices, problems and prospects. Interpreting 20 (1), 59–95.
(2018b). Latent trait modelling of rater accuracy in formative peer assessment of English–Chinese consecutive interpreting. Assessment & Evaluation in Higher Education 43 (6), 979–994.
(2019). A generalizability theory study of optimal measurement design for a summative assessment of English/Chinese consecutive interpreting. Language Testing 36(3), 419–438.
Hlavac, J. (2013). A cross-national overview of translator and interpreter certification procedures. Translation & Interpreting 51, 32–65.
Lee, J. (2008). Rating scales for interpreting performance assessment. The Interpreter and Translator Trainer 2 (2), 165–184.
Lee, S-B. (2015). Developing an analytic scale for assessing undergraduate students’ consecutive interpreting performances. Interpreting 17 (2), 226–254.
(2019). Holistic assessment of consecutive interpretation: How interpreter trainers rate student performance. Interpreting 21 (2), 245–269.
Lee, T-H. (1999). Simultaneous listening and speaking in English into Korean simultaneous interpretation. Meta 44 (1), 560–572.
Liu, M-H. (2004). Working memory and expertise in simultaneous interpreting. Interpreting 6 (1), 19–42.
(2013). Design and analysis of Taiwan’s interpretation certification examination. In D. Tsagari & R. van Deemter (Eds.), Assessment issues in language translation and interpreting. Frankfurt: Peter Lang, 163–178.
Liu, M-H., Chang, C-C. & Wu, S-C. (2008). Interpretation evaluation practices: Comparison of eleven schools in Taiwan, China, Britain, and the USA. Compilation and Translation Review 1 (1), 1–42.
Liu, M-H. & Chiu, Y-H. (2009). Assessing source material difficulty for consecutive interpreting: Quantifiable measures and holistic judgment. Interpreting 11 (2), 244–266.
Meuleman, C. & Van Besien, F. (2009). Coping with extreme speech conditions in simultaneous interpreting. Interpreting 11 (1), 20–34.
Myford, C. M. & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement 4 (4), 386–422.
Sawyer, D. B. (2004). Fundamental aspects of interpreter education: Curriculum and assessment. Amsterdam: John Benjamins.
Setton, R. & Dawrant, A. (2016). Conference interpreting: A trainer’s guide. Amsterdam: John Benjamins.
Setton, R. & Motta, M. (2007). Syntacrobatics: Quality and reformulation in simultaneous-with-text. Interpreting 9 (2), 199–230.
Skaaden, H. (2013). Assessing interpreter aptitude in a variety of languages. In D. Tsagari & R. van Deemter (Eds.), Assessment issues in language translation and interpreting. Frankfurt: Peter Lang, 35–50.
Stemler, S. E. & Tsai, J. (2008). Best practices in estimating interrater reliability: Three common approaches. In J. Osborne (Ed.), Best practices in quantitative methods. Thousand Oaks, CA: Sage, 29–49.
Tiselius, E. (2009). Revisiting Carroll’s scales. In C. V. Angelelli & H. E. Jacobson (Eds.), Testing and assessment in translation and interpreting studies. Amsterdam: John Benjamins, 95–121.
Tommola, J. & Helevä, M. (1998). Language direction and source text complexity: Effects on trainee performance in simultaneous interpreting. In L. Bowker, M. Cronin, D. Kenny & J. Pearson (Eds.), Unity in diversity? Current trends in translation studies. Manchester: St Jerome, 177–186.
Vermeiren, H., Gucht, J. V. & De Bontridder, L. (2009). Standards as critical success factors in assessments: Certifying social interpreters in Flanders, Belgium. In C. V. Angelelli & H. E. Jacobson (Eds.), Testing and assessment in translation and interpreting studies. Amsterdam: John Benjamins, 291–330.
Wang, W-W., Xu, Y., Wang, B-H. & Mu, L. (2020). Developing interpreting competence scales in China. Frontiers in Psychology 111, 481.
Wu, J., Liu, M. & Liao, C. (2013). Analytic scoring in interpretation test: Construct validity and the halo effect. In H-H. Liao, T-E. Kao & Y. Lin (Eds.), The making of a translator: Multiple perspectives. Taipei: Bookman, 277–292.
Cited by (16)
Cited by 16 other publications
Chen, Shiyue & Yan Lin
Han, Chao, Shirong Chen & Jia Feng
2025. Modeling rater cognition in translation assessment. Target. International Journal of Translation Studies 37:4 ► pp. 590 ff.
Han, Chao, Mengting Jiang & Qionglu Chen
Han, Chao, Xiaolei Lu & Shirong Chen
Han, Chao & Yueqing Wang
2025. Conducting replication in translation and interpreting studies. Target. International Journal of Translation Studies 37:3 ► pp. 444 ff.
Li, Yang, Xini Liao & Jia Jia
Al-Amin, Md., Fatematuz Zahra Saqui & Md. Rabbi Khan
Wu, Guo-hua, You-jie Guo, Fang-zhou Qi, Shen Zhang, Yi-xiao Wang, Xin Tong & Liang Zhang
Cai, Rendong, Jiexuan Lin & Yanping Dong
Han, Chao, Juan Hu & Yi Deng
2023. Effects of language background and directionality on raters’ assessments of spoken-language interpreting. Revista Española de Lingüística Aplicada/Spanish Journal of Applied Linguistics 36:2 ► pp. 556 ff.
Han, Chao & Xiaolei Lu
Han, Chao & Xiaolei Lu
Han, Chao & Xiaoqi Shang
2023. An item-based, Rasch-calibrated approach to assessing translation quality. Target. International Journal of Translation Studies 35:1 ► pp. 63 ff.
Zhao, Nan
Chen, Jing, Huabo Yang & Chao Han
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
