Some Translation Studies informed suggestions for further balancing methodologies for machine translation quality evaluation

Krüger, Ralph

doi:10.1075/ts.21026.kru

Article published In: Translation Spaces
Vol. 11:2 (2022) ► pp.213–233

Get fulltext from our e-platform

Download PDF

Download EPUB

Some Translation Studies informed suggestions for further balancing methodologies for machine translation quality evaluation

Ralph Krüger | TH Köln – University of Applied Sciences

Published online: 18 March 2022

https://doi.org/10.1075/ts.21026.kru

Abstract

This article intends to contribute to the current debate on the quality of neural machine translation (NMT) vs. (professional) human translation quality, where recently claims concerning (super)human performance of NMT systems have emerged. The article will critically analyse some current machine translation (MT) quality evaluation methodologies employed in studies claiming such performance of their MT systems. This analysis aims to identify areas where these methodologies are potentially biased in favour of MT and hence may overvalue MT performance while undervaluing human translation performance. Then, the article provides some Translation Studies informed suggestions for improving or debiasing these methodologies in order to arrive at a more balanced picture of MT vs. (professional) human translation quality.

Keywords: machine translation quality evaluation, professional human translation, (super)human MT performance, MT bias, translation studies

Article outline

1.Introduction
2.The need for properly balanced MT quality evaluation methodologies
3.The current debate on (super)human performance of NMT
- 3.1Google: Bridging the gap between human and machine translation
- 3.2Microsoft: Parity between professional human and machine translation
- 3.3Criticism of Microsoft’s evaluation methodology
- 3.4CUBBIT: Human translation is not the upper bound of translation quality
4.Suggestions for further balancing MT quality evaluation methodologies
- 4.1The quality of the human reference translations against which MT quality is to be measured
- 4.2The extent of translational context taken into consideration in the MT quality evaluation campaign
- 4.3Weighing translation errors according to their severity
- 4.4Integrating MT systems into high-quality translation settings in order to measure the added value of professional human translators
5.Areas where current NMT systems necessarily underperform compared to professional human translators
6.Conclusion
Acknowledgements
Notes
References

References (36)

References

ELIS. 2021. European Language Industry Survey. Accessed June 9, 2021. [URL]

ErgoTrans. 2015. Final Report: Cognitive and Physical Ergonomics of Translation (ErgoTrans). Accessed June 24 2021. [URL]

Freitag, Markus, George Foster, David Grangier, Viresh Ratnakar, Qijun Tan, and Wolfgang Macherey. 2021. “Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation.” arXiv. Accessed June 9, 2021. [URL].

Grice, Herbert P. 1975. “Logic and Conversation.” In Syntax and Semantics. Volume 31, edited by Peter Cole, and Jerry L. Morgan. 41–58. New York: Academic Press.

Hassan, Hany, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, Shujie Liu, Tie-Yan Liu, Renqian Luo, Arul Menezes, Tao Qin, Frank Seide, Xu Tan, Fei Tian, Lijun Wu, Shuangzhi Wu, Yingce Xia, Dongdong Zhang, Zhirui Zhang, and Ming Zhou. 2018. “Achieving Human Parity on Automatic Chinese to English News Translation.” arXiv. Accessed June 9, 2021. [URL]

Horn-Helf, Brigitte. 1999. Technisches Übersetzen in Theorie und Praxis. [The Theory and Practice of Technical Translation]. Tübingen/Basel: Francke.

House, Juliane. 2006. “Communicative Styles in English and German.” European Journal of English Studies 10(3), 249–267.

Kade, Otto. 1968. Zufall und Gesetzmäßigkeit in der Übersetzung [Coincidence and Regularities in Translation]. Leipzig: Verlag Enzyklopädie.

Koehn, Philipp. 2020. Neural Machine Translation. Cambridge: University Press.

Krüger, Ralph. 2015. The Interface between Scientific and Technical Translation Studies and Cognitive Linguistics. With Particular Emphasis on Explicitation and Implicitation as Indicators of Translational Text-Context Interaction. Berlin: Frank & Timme.

. 2016. “Situated LSP Translation from a Cognitive Translational Perspective.” Lebende Sprachen 61(2), 297–332.

. 2020. “Explicitation in Neural Machine Translation.” Across Languages and Cultures 21(2), 195–216.

Läubli, Samuel, Rico Sennrich, and Martin Volk. 2018. “Has Machine Translation Achieved Human Parity? A Case for Document-Level Evaluation.” In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, edited by Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii. 4791–4796. Association for Computational Linguistics. Accessed June 9, 2021.

Läubli, Samuel, Sheila Castilho, Graham Neubig, Rico Sennrich, Qinlan Shen, and Antonio Toral. 2020. “A Set of Recommendations for Assessing Human-Machine Parity in Language Translation.” Journal of Artificial Intelligence Research 671, 653–672. Accessed June 9, 2021.

Lommel, Arle. 2018. “Metrics for Translation Quality Assessment: A Case for Standardising Error Typologies.” In Translation Quality Assessment. From Principles to Practice, edited by Joss Moorkens, Sheila Castilho, Federico Gaspari, and Stephen Doherty. 109–127. Springer.

. 2020. “At Human Parity? A Skeptical Response to MT Quality Claims” In Maschinelle Übersetzung für Übersetzungsprofis, edited by Jörg Porsiel. 185–197. BDÜ Fachverlag.

Macken, Lieve, Daniel Prou, and Arda Tezcan. 2020. “Quantifying the Effect of Machine Translation in a High-Quality Human Translation Production Process.” Informatics 7(2), 1–19. Accessed June 25, 2021. [URL]

Maruf, Sameen, Fahimeh Saleh, and Gholamreza Haffari. 2021. A Survey on Document-Level Neural Machine Translation: Methods and Evaluation. ACM Computing Surveys 54(2), 1–36. Accessed November 1, 2021.

Melby, Alan. 2019. “Bells MT (Machine Translation) Does Not Yet Ring.” Presentation at APTIF 9: Reality vs. Illusion: From Morse Code to Machine Translation.

Muzii, Luigi. 2021. “Close Call – Observations on Productivity, Talent Shortages, & Human Parity MT.” eMpTy Pages. Accessed June 12, 2021. [URL]

Nord, Christiane. 1997. Translating as a Purposeful Activity. Functionalist Approaches Explained. Manchester: St. Jerome.

. 2009. Textanalyse und Übersetzen. Theoretische Grundlagen, Methode und didaktische Anwendung einer übersetzungsrelevanten Textanalyse [Text Analysis and Translation. Theoretical Foundations, Method and Didactic Application of a Translation-Relevant Text Analysis]. 4th edition. Tübingen: Gross.

Popel, Martin, Marketa Tomkova, Jakub Tomek, Łukasz Kaiser, Jakob Uszkoreit, Ondřej Bojar, and Zdeněk Žabokrtský. 2020. “Transforming Machine Translation: a Deep Learning System Reaches News Translation Quality Comparable to Human Professionals.” Nature Communications 111, 1–15. Accessed June 9, 2021.

Pym, Anthony. 2020. “Translation, Risk Management and Cognition.” In The Routledge Handbook of Translation and Cognition, edited by Favio Alves and Arnt Lykke Jakobsen. 445–458. New York: Routledge.

Reiß, Katharina, Hans J. Vermeer. 1991. Grundlegung einer allgemeinen Translationstheorie [Laying the Foundations for a General Theory of Translation and Interpreting]. 2nd edition. Tübingen: Niemeyer.

Risku, Hanna. 2004. Translationsmanagement. Interkulturelle Fachkommunikation im Kommunikationszeitalter [Translation Management. Intercultural LSP Communication in the Communication Age]. Tübingen: Narr.

Schmitt, Peter A. 2015. “Who Is Afraid of MT?” Lebende Sprachen 60(2), 234–258.

Sulubacak, Umut, Ozan Caglayan, Stig-Arne Grönroos, Aku Rouhe, Desmond Elliott, Lucia Specia, and Jörg Tiedemann. 2020. “Multimodal Machine Translation through Visuals and Speech.” Machine Translation 34(2–3), 97–147.

Toral, Antonio, Sheila Castilho, Ken Hu, and Andy Way. 2018. “Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation.” In Proceedings of the Third Conference on Machine Translation: Research Papers, edited by Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, and Karin Verspoor. 113–123. Accessed June 9, 2021.

Vashee, Kirti. 2021a. “The Quest for Human Parity Machine Translation.” eMpTy Pages. Accessed November 6, 2021. [URL]

. 2021b. “Understanding Machine Translation Quality: A Review.” eMpTy Pages. Accessed November 6, 2021. [URL]

. 2021c. “The Human-in-the-Loop Driving MT Progress.” eMpTy Pages. Accessed November 6, 2021. [URL]

Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jacob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” In Advances in Neural Information Processing Systems 30 (NIPS 2017), edited by Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett. 1–11. Accessed June 9, 2021. [URL]

Vieira, Lucas Nunes. 2020. “Machine Translation in the News. A Framing Analysis of the Written Press.” Translation Spaces 9(1), 98–122.

Way, Andy. 2019. “Machine Translation: Where Are We at Today? In The Bloomsbury Companion to Language Industry Studies, edited by Erik Angelone, Maureen Ehrensberger-Dow, and Gary Massey. 311–332. Bloomsbury Academic.

Wu, Yonghui, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation.” arXiv. Accessed June 9, 2021. [URL]

Cited by (11)

Cited by 11 other publications

Order by:

Chen, Luyu, Lilla Varga & Milad Mehdizadkhani

2025. Taming AI for The Little Prince: a comparative analysis of NMTs and LLMs in Hungarian translation. Humanities and Social Sciences Communications

Fan, Rui & Yue Zhang

2025. Strengths and Limitations of Machine Translation and Human Translation. In Role of AI in Translation and Interpretation [Advances in Computational Intelligence and Robotics, ], ► pp. 95 ff.

Yao, Xiaofang, Yong-Bin Kang & Anthony McCosker

2025. Missing the human touch?. Translation Spaces

Durr, Margarete

2024. Le traducteur humain a-t-il (encore) un avenir en traduction juridique ?. Lebende Sprachen 69:1 ► pp. 69 ff.

Li, Chen & Zhiyuan Sun

2024. Evaluation of the Quality of Sustainable Entrepreneurship Education in Universities Based on the Grey Correlation Algorithm. Journal of Information & Knowledge Management 23:03

Moorkens, Joss

2024. ‘I am not a number’: on quantification and algorithmic norms in translation. Perspectives 32:3 ► pp. 477 ff.

Zhou, Yanjun & Shuling Zhou

2024. 2024 International Conference on Distributed Systems, Computer Networks and Cybersecurity (ICDSCNC), ► pp. 1 ff.

Li, Ruichao, Abdullah Mohd Nawi & Myoung Sook Kang

2023. Human-machine Translation Model Evaluation Based on Artificial Intelligence Translation. EMITTER International Journal of Engineering Technology 11:2 ► pp. 145 ff.

Yang, Yanxia, Runze Liu, Xingmin Qian & Jiayue Ni

2023. Performance and perception: machine translation post-editing in Chinese-English news translation by novice translators. Humanities and Social Sciences Communications 10:1

Krüger, Ralph

2022. Integrating professional machine translation literacy and data literacy. Lebende Sprachen 67:2 ► pp. 247 ff.

Krüger, Ralph

2023. Artificial intelligence literacy for the language industry – with particular emphasis on recent large language models such as GPT-4. Lebende Sprachen 68:2 ► pp. 283 ff.

This list is based on CrossRef data as of 6 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.