Leveraging a large language model for error analysis-based automatic feedback in interpreter training: An exploratory study

Liu, Wenjing; Pagano, Adriana

doi:10.1075/tcb.00101.liu

Article In: Humans, Machines, and Embedded Translation
Edited by Sandra L. Halverson and Jean Nitzke
[Translation, Cognition & Behavior 8:2] 2025

Leveraging a large language model for error analysis-based automatic feedback in interpreter training

An exploratory study

Wenjing Liu | Macao Polytechnic University

Adriana Pagano | Federal University of Minas Gerais

This content is being prepared for publication; it may be subject to changes.

Abstract

Feedback enables learners to improve performance and teachers to refine instruction. With advances in large language models (LLMs), automatic feedback has emerged as an efficient and innovative complement to traditional sources such as teacher, peer, and self-feedback. This study explores the integration of error analysis–based feedback generated by ChatGPT-4o into Chinese–Portuguese interpreter training. The model was prompted to detect and explain interpreting errors in aligned sentence pairs and to offer reference translations. We then evaluated the accuracy of these feedback components and the perceived usefulness of feedback through a questionnaire administered to two groups of stakeholders: interpreting teachers (as feedback providers) and interpreting trainees (as feedback users). Findings indicated that for the test set of sentences used, the LLM-generated feedback was rated as high quality, and both evaluator cohorts expressed favorable views on its usefulness in interpreter training. These results provide preliminary evidence that LLM-based feedback can serve as a valuable complement to human feedback in pedagogical contexts.

Keywords: feedback, interpreter training, large language models, ChatGPT, error analysis, Chinese-Portuguese simultaneous interpreting

Article outline

1.Introduction
2.Literature review
- 2.1ChatGPT for feedback
- 2.2Error analysis in interpreting quality assessment
3.The present study
4.Methodology
- 4.1LLM feedback generation
  - 4.1.1Prompt design
  - 4.1.2Source and target speech selection
- 4.2Evaluation instrument
  - 4.2.1Sampling strategy and questionnaire grouping
  - 4.2.2Questionnaire architecture
- 4.3Participants
5.Findings
- 5.1Inter-rater consistency in different questionnaire groups
- 5.2LLM-Generated feedback quality evaluation
  - 5.2.1Error type identification accuracy
  - 5.2.2Error explanation accuracy and reference translation adequacy
- 5.3LLM-Generated feedback usefulness perception
- 5.4Use of LLM in interpreter training
6.Discussion
- 6.1Quality of AI-generated feedback
- 6.2New Dynamics of feedback mechanism in interpreter training
7.Conclusion
Artificial Intelligence Statement
Note
References

References (69)

References

Balaman, Sevda. 2024. “Exploring Undergraduate Students’ Viewpoints on Corrective Feedback Implementations in Interpreting.” Korkut Ata Türkiyat Araştırmaları Dergisi (15): 994–1011.

Barik, Henri. C. 1971. “A description of various types of omissions, additions and errors of translation encountered in simultaneous interpretation.” Meta 16 (4): 199–210.

Biber, Douglas. 1993. “Representativeness in corpus design.” Literary and linguistic computing 8 (4): 243–257.

Bland, J. Martin, and Douglas Altman. 1986. “Statistical methods for assessing agreement between two methods of clinical measurement.” The Lancet, 327 (8476): 307–310.

Brown, Tom B., et al. 2020. “Language Models Are Few-Shot Learners.” arXiv.

Caruso, Marinella, Fraschini, Nicola, and Kuuse, Sabine. 2019. “Online tools for feedback engagement in second language learning.” International Journal of Computer-Assisted Language Learning and Teaching (IJCALLT), 9 (1): 58–78.

Chen, Ziqi, et al. 2024. “L2 students’ barriers in engaging with form and content-focused AI-generated feedback in revising their compositions.” Computer Assisted Language Learning, 1–21.

Cohen, Jacob. 2013. Statistical power analysis for the behavioral sciences. New York: Routledge.

Creswell, John. W., and Creswell, J. David. 2023. Research design: Qualitative, quantitative, and mixed methods approaches. 6th ed. California: Sage.

Dai, Wei, et al. 2023. “Can large language models provide feedback to students? A case study on ChatGPT.” 2023 IEEE International Conference on Advanced Learning Technologies (ICALT).

ElSayary, Areej. 2024. “An investigation of teachers’ perceptions of using ChatGPT as a supporting tool for teaching and learning in the digital era.” Journal of Computer Assisted Learning, 40 (3): 931–945.

Er, Erkan, et al. 2025. “Assessing student perceptions and use of instructor versus AI-generated feedback.” British Journal of Educational Technology, 56 (3): 1074–1091.

Escalante, Juan, Pack, Austin, and Barrett, Alex. 2023. “AI-generated feedback on writing: Insights into efficacy and ENL student preference.” International Journal of Educational Technology in Higher Education, 20 (1): 57.

Fahmy, Yasin. 2024. Student Perception on AI-Driven Assessment: Motivation, Engagement and Feedback Capabilities. Bachelor Essay, University of Twente.

Falbo, Caterina. 2002. “Error analysis: A research tool.” In Perspectives on Interpreting, edited by Giuliana Garzone, Peter Mead, and Maurizio Viezzi, 111–127. Bologna: CLUEB.

Fernandes, Patrick, et al. 2023. “The devil is in the errors: Leveraging large language models for fine-grained machine translation evaluation.” arXiv.

Flores, Glenn, et al. 2003. “Errors in Medical Interpretation and Their Potential Clinical Consequences in Pediatric Encounters.” Pediatrics, 111 (1): 6–14.

Fowler, Yvonne. 2007. “Formative assessment: Using peer and self-assessment in interpreter training.” In The Critical Link 4: Professionalisation of interpreting in the community, edited by Cecilia Wadensjö, Birgitta E. Dimitrova, and Anna-Lena Nilsson, vol. 701, 253–262. Amsterdam: John Benjamins.

Giavarina, Davide. 2015. “Understanding bland altman analysis.” Biochemia Medica, 25 (2): 141–151.

Gile, Daniel. 2009. “Language availability and its implications in conference interpreting (and translation).” In Basic Concepts and Models for Interpreter and Translator Training, edited by D. Gile, 219–244. Amsterdam: John Benjamins.

. 2011. “Errors, omissions and infelicities in broadcast interpreting: Preliminary findings from a case study.” In Methods and strategies of process research: Integrative approaches in translation studies, edited by Cecilia Alvstad, Adelina Hild, and Elisabet Tiselius, 201–218. Amsterdam: John Benjamins.

Guo, Kai, and Wang, Deliang. 2024. “To resist it or to embrace it? Examining ChatGPT’s potential to support teacher feedback in EFL writing.” Education and Information Technologies, 29 (7): 8435–8463.

Hallgren, Kevin A. 2012. “Computing inter-rater reliability for observational data: an overview and tutorial.” Tutorials in quantitative methods for psychology 8(1): 23–34.

Han, Chao. 2018. “Using rating scales to assess interpretation: Practices, problems and prospects.” Interpreting, 20 (1): 59–95.

. 2021. “Interpreting testing and assessment: A state-of-the-art review.” Language Testing, 00 (0): 1–26.

Han, Chao, and Lu, Xiaolei. 2021. “Interpreting quality assessment re-imagined: The synergy between human and machine scoring.” Interpreting and Society, 1 (1): 70–90.

Han, Chao, Lu, Xiaolei, and Fan, Qin. 2025. “Taming generative AI for interpreter education: using large language models in classroom-based assessment of English-Chinese consecutive interpreting.” The Interpreter and Translator Trainer, 19 (3–4): 444–464.

Han, Lili. 2022. “Portuguese interpreting teaching in China: past, present and future — Macao’s contribution.” Macao Polytechnic University Journal, (2): 52–61.

Hattie, John, and Timperley, Helen. 2007. “The power of feedback.” Review of educational research, 77 (1): 81–112.

Holewik, Katarzyna. 2020. “Peer feedback and reflective practice in public service interpreter training.” Theory and Practice of Second Language Acquisition, 2 (6): 133–159.

Huang, Jerry. 2023. “Engineering ChatGPT prompts for EFL writing classes.” International Journal of TESOL Studies, 5 (4): 73–79.

Kelly, Dorothy. 2014. A Handbook for Translator Trainers: A Guide to Reflective Practice. London: Routledge.

Khansir, Ali Akbar. 2012. “Error analysis and second language acquisition.” Theory and practice in language studies, 2 (5): 1027–1032.

Kocmi, Tom, and Federmann, Christian. 2023. “GEMBA-MQM: Detecting translation quality error spans with GPT-4.” arXiv.

Korzynski, Pawel, et al. 2023. “Artificial intelligence prompt engineering as a new digital competence: Analysis of generative AI technologies such as ChatGPT.” Entrepreneurial Business and Economics Review, 11 (3): 25–37.

Lee, Jieun. 2018. “Feedback on feedback: Guiding student interpreter performance.” Translation & Interpreting: The International Journal of Translation and Interpreting Research, 10 (1): 152–170.

Lee, Ziying. 2015. The reflection and self-assessment of student interpreters through logbooks: A case study. Doctoral Dissertation, Heriot-Watt University.

Liang, Weixin, et al. 2024. “Can large language models provide useful feedback on research papers? A large-scale empirical analysis.” NEJM AI, 1 (8).

Lu, Rong, et al. 2024. “Modelling error types in consecutive interpreting.” In The Routledge Handbook of Chinese Interpreting, edited by Riccardo Moratto and Cheng Zhan, 207–225. London: Routledge.

2021. “Error Types in Consecutive Interpreting among Student Interpreters between Chinese and English: A pilot study.” Proceedings of 7th Malaysia International Conference on Foreign Languages, compiled by Hazlina Abdul Halim and Lay Hoon Ang, 267–275. Selangor: Universiti Putra Malaysia.

Lu, Xinchao. 2025. “The eﬀectiveness of ChatGPT-assisted Lexical-syntactic Flexibility practice for interpreting competence and quality: the case of Chinese-to-English consecutive interpreting.” The Interpreter and Translator Trainer.

Machová, Lydia. 2016. “Students’ Self-Assessment of Simultaneous Interpreting: Distribution of Comments on Quality and Processes.” In Interchange between languages and cultures: The quest for quality, edited by Jitka Zehnalová, Ondřej Molnár, and Michal Kubánek, 103–118. Olomouc: Palacký University.

Malau, Putri Pridani, et al. 2021. “Errors in Consecutive Interpreting: A Case of Jessica Kumalawongso’s Court.” Language Literacy: Journal of Linguistics, Literature and Language Teaching, 5 (1): 71–79.

Mraček, David, and Vavroušová, Petra Mračková. 2021. “Self-reflection tools in interpreter training: A case study involving learners’ diaries.” In Changing paradigms and approaches in interpreter training, edited by Pavol Šveda, 229–247. London: Routledge.

Musa, Zakariya Yaseen and Al-Maryani, Jasim Khalifah Sultan. 2021. “Assessing the Simultaneous Interpreting Outputs of Trainee Interpreters in Iraqi Departments of Translation.” Adab Al-Basrah, 2 (95).

Nazaretsky, Tanya, et al. 2024. “AI or human? Evaluating student feedback perceptions in higher education.” In Technology Enhanced Learning for Inclusive and Equitable Quality Education. EC-TEL 2024: Lecture Notes in Computer Science, edited by Rafael Ferreira Mello, Nikol Rummel, Ioana Jivet, Gerti Pishtari, and José A. Ruipérez Valiente, vol 151591. Switzerland: Springer Cham.

Pankiewicz, Maciej, and Baker, Ryan S. 2023. “Large Language Models (GPT) for automating feedback on programming assignments.” arXiv.

Postigo, Pinazo Encarnación. 2008. “Self-assessment in teaching interpreting.” Traduction, terminologie, rédaction, 21 (1): 173–209.

Pratiwi, Rully Sutrirasa. 2016. “Common Errors And Problems Encountered by Students English to Indonesian Consecutive Interpreting.” Journal of English and Education, 4 (1): 127–146.

Shehab, Ali Ahmed, and Al-Maryani, Jasim. 2019. “The Impact of Ideological and Non-Ideological Factors on the Quality and Quantity of Error in the Simultaneous Interpreting of Contemporary American Political Discourse.” Adab Al-Basrah, 901: 33–84.

Su, Yanfang, Xu, Simin and Liu, Kanglong. 2025. “Adapt or adopt? Examining the eﬃcacy of ChatGPT in providing translation feedback.” The Interpreter and Translator Trainer, 19 (3–4): 296–316.

Tahraoui, Amina. 2022. “Teaching sight and bilateral interpreting online: students’ perceptions of teacher feedback.” Texto Livre, 151: e39545.

Teng, Mark Feng. 2024. “‘ChatGPT is the companion, not enemies’: EFL learners’ perceptions and experiences in using ChatGPT for feedback in writing.” Computers and Education: Artificial Intelligence, 71, 100270.

. 2025. “Metacognitive Awareness and EFL Learners’ Perceptions and Experiences in Utilising ChatGPT for Writing Feedback.” European Journal of Education, 60 (1): e12811.

Tian, Lili, and Zhou, Yu. 2020. “Learner engagement with automated feedback, peer feedback and teacher feedback in an online EFL writing context.” System, 911, 102247.

Wang, Binhua. 2015. “Bridging the gap between interpreting classrooms and real-world interpreting.” International journal of interpreter education, 7 (1): 65–73.

Wang, Hairuo. 2015. “Error Analysis in Consecutive Interpreting of Students with Chinese and English Language Pairs.” Canadian Social Science, 11 (11): 65–79.

Wang, Xiaoman, and Wang, Binhua. 2025. “Advancing automatic assessment of target-language quality in interpreter training with large language models: insights from explainable AI.” The Interpreter and Translator Trainer, 19 (3–4): 465–485.

Wiliam, Dylan, and Thompson, Marnie. 2017. “Integrating assessment with learning: What will it take to make it work?” In The future of assessment: shaping teaching and learning, edited by Carol Anne Dwyer, 53–82. New York: Routledge.

Wu, Wenchieh. 2019. An exploration of performance feedback from student interpreter perspectives. Master Thesis, National Taiwan Normal University.

Wu, Zhiwei. 2017. “The interrelationship among in-class peer-assessment, interpreting anxiety and interpreting performance.” Language Education, 5 (4): 33–37.

Xu, Simin, et al. 2024. “Integrating AI for Enhanced Feedback in Translation Revision-A Mixed-Methods Investigation of Student Engagement.” arXiv.

2025. “Investigating student engagement with AI-driven feedback in translation revision: A mixed-methods study.” Education and Information Technologies, 1–27.

Xue, Ruqian, and Liu, Qin. 2024. “Exploring student interpreters’ engagement with different sources of feedback on note-taking.” Innovations in Education and Teaching International, 62 (4): 1135–1148.

Yu, Jing, and Liu, Kanglog. 2024. “Reshaping Translation Studies: Paradigm Shifts and Future Directions in the Age of AI Technology.” Journal of Foreign Languages, 47 (4): 72–81.

Yu, Yi, Wei, Wei and Chen, Ziqi. 2025. “Comparing learners’ engagement strategies with feedback from a Generative AI chatbot and peers in an interpreter training programme: a quasi-experimental study.” The Interpreter and Translator Trainer.

Zhai, Xiaoming, and Nehm, Ross H. 2023. “AI and formative assessment: The train has left the station.” Journal of Research in Science Teaching, 60 (6): 1390–1398.

Zhan, Cheng, and Huang, Jing. 2024. “Learner engagement with corrective feedback in interpreting training.” Foreign Language Education in China.

Zhao, Nan, et al. 2023. “Speech errors in consecutive interpreting: Effects of language proficiency, working memory, and anxiety.” Plos One, 18 (10).