Article published In: Translation Spaces
Vol. 11:2 (2022) ► pp.213–233
Some Translation Studies informed suggestions for further balancing methodologies for machine translation quality evaluation
Published online: 18 March 2022
https://doi.org/10.1075/ts.21026.kru
https://doi.org/10.1075/ts.21026.kru
Abstract
This article intends to contribute to the current debate on the quality of neural machine translation (NMT) vs. (professional) human translation quality, where recently claims concerning (super)human performance of NMT systems have emerged. The article will critically analyse some current machine translation (MT) quality evaluation methodologies employed in studies claiming such performance of their MT systems. This analysis aims to identify areas where these methodologies are potentially biased in favour of MT and hence may overvalue MT performance while undervaluing human translation performance. Then, the article provides some Translation Studies informed suggestions for improving or debiasing these methodologies in order to arrive at a more balanced picture of MT vs. (professional) human translation quality.
Article outline
- 1.Introduction
- 2.The need for properly balanced MT quality evaluation methodologies
- 3.The current debate on (super)human performance of NMT
- 3.1Google: Bridging the gap between human and machine translation
- 3.2Microsoft: Parity between professional human and machine translation
- 3.3Criticism of Microsoft’s evaluation methodology
- 3.4CUBBIT: Human translation is not the upper bound of translation quality
- 4.Suggestions for further balancing MT quality evaluation methodologies
- 4.1The quality of the human reference translations against which MT quality is to be measured
- 4.2The extent of translational context taken into consideration in the MT quality evaluation campaign
- 4.3Weighing translation errors according to their severity
- 4.4Integrating MT systems into high-quality translation settings in order to measure the added value of professional human translators
- 5.Areas where current NMT systems necessarily underperform compared to professional human translators
- 6.Conclusion
- Acknowledgements
- Notes
References
References (36)
ELIS. 2021. European Language Industry Survey. Accessed June 9, 2021. [URL]
ErgoTrans. 2015. Final Report: Cognitive and Physical Ergonomics of Translation (ErgoTrans). Accessed June 24 2021. [URL]
Freitag, Markus, George Foster, David Grangier, Viresh Ratnakar, Qijun Tan, and Wolfgang Macherey. 2021. “Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation.” arXiv. Accessed June 9, 2021. [URL].
Grice, Herbert P. 1975. “Logic and Conversation.” In Syntax and Semantics. Volume 31, edited by Peter Cole, and Jerry L. Morgan. 41–58. New York: Academic Press.
Hassan, Hany, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, Shujie Liu, Tie-Yan Liu, Renqian Luo, Arul Menezes, Tao Qin, Frank Seide, Xu Tan, Fei Tian, Lijun Wu, Shuangzhi Wu, Yingce Xia, Dongdong Zhang, Zhirui Zhang, and Ming Zhou. 2018. “Achieving Human Parity on Automatic Chinese to English News Translation.” arXiv. Accessed June 9, 2021. [URL]
Horn-Helf, Brigitte. 1999. Technisches Übersetzen in Theorie und Praxis. [The Theory and Practice of Technical Translation]. Tübingen/Basel: Francke.
House, Juliane. 2006. “Communicative Styles in English and German.” European Journal of English Studies 10(3), 249–267.
Kade, Otto. 1968. Zufall und Gesetzmäßigkeit in der Übersetzung [Coincidence and Regularities in Translation]. Leipzig: Verlag Enzyklopädie.
Krüger, Ralph. 2015. The Interface between Scientific and Technical Translation Studies and Cognitive Linguistics. With Particular Emphasis on Explicitation and Implicitation as Indicators of Translational Text-Context Interaction. Berlin: Frank & Timme.
. 2016. “Situated LSP Translation from a Cognitive Translational Perspective.” Lebende Sprachen 61(2), 297–332.
. 2020. “Explicitation in Neural Machine Translation.” Across Languages and Cultures 21(2), 195–216.
Läubli, Samuel, Rico Sennrich, and Martin Volk. 2018. “Has Machine Translation Achieved Human Parity? A Case for Document-Level Evaluation.” In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, edited by Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii. 4791–4796. Association for Computational Linguistics. Accessed June 9, 2021.
Läubli, Samuel, Sheila Castilho, Graham Neubig, Rico Sennrich, Qinlan Shen, and Antonio Toral. 2020. “A Set of Recommendations for Assessing Human-Machine Parity in Language Translation.” Journal of Artificial Intelligence Research 671, 653–672. Accessed June 9, 2021.
Lommel, Arle. 2018. “Metrics for Translation Quality Assessment: A Case for Standardising Error Typologies.” In Translation Quality Assessment. From Principles to Practice, edited by Joss Moorkens, Sheila Castilho, Federico Gaspari, and Stephen Doherty. 109–127. Springer.
. 2020. “At Human Parity? A Skeptical Response to MT Quality Claims” In Maschinelle Übersetzung für Übersetzungsprofis, edited by Jörg Porsiel. 185–197. BDÜ Fachverlag.
Macken, Lieve, Daniel Prou, and Arda Tezcan. 2020. “Quantifying the Effect of Machine Translation in a High-Quality Human Translation Production Process.” Informatics 7(2), 1–19. Accessed June 25, 2021. [URL]
Maruf, Sameen, Fahimeh Saleh, and Gholamreza Haffari. 2021. A Survey on Document-Level Neural Machine Translation: Methods and Evaluation. ACM Computing Surveys 54(2), 1–36. Accessed November 1, 2021.
Melby, Alan. 2019. “Bells MT (Machine Translation) Does Not Yet Ring.” Presentation at APTIF 9: Reality vs. Illusion: From Morse Code to Machine Translation.
Muzii, Luigi. 2021. “Close Call – Observations on Productivity, Talent Shortages, & Human Parity MT.” eMpTy Pages. Accessed June 12, 2021. [URL]
Nord, Christiane. 1997. Translating as a Purposeful Activity. Functionalist Approaches Explained. Manchester: St. Jerome.
. 2009. Textanalyse und Übersetzen. Theoretische Grundlagen, Methode und didaktische Anwendung einer übersetzungsrelevanten Textanalyse [Text Analysis and Translation. Theoretical Foundations, Method and Didactic Application of a Translation-Relevant Text Analysis]. 4th edition. Tübingen: Gross.
Popel, Martin, Marketa Tomkova, Jakub Tomek, Łukasz Kaiser, Jakob Uszkoreit, Ondřej Bojar, and Zdeněk Žabokrtský. 2020. “Transforming Machine Translation: a Deep Learning System Reaches News Translation Quality Comparable to Human Professionals.” Nature Communications 111, 1–15. Accessed June 9, 2021.
Pym, Anthony. 2020. “Translation, Risk Management and Cognition.” In The Routledge Handbook of Translation and Cognition, edited by Favio Alves and Arnt Lykke Jakobsen. 445–458. New York: Routledge.
Reiß, Katharina, Hans J. Vermeer. 1991. Grundlegung einer allgemeinen Translationstheorie [Laying the Foundations for a General Theory of Translation and Interpreting]. 2nd edition. Tübingen: Niemeyer.
Risku, Hanna. 2004. Translationsmanagement. Interkulturelle Fachkommunikation im Kommunikationszeitalter [Translation Management. Intercultural LSP Communication in the Communication Age]. Tübingen: Narr.
Sulubacak, Umut, Ozan Caglayan, Stig-Arne Grönroos, Aku Rouhe, Desmond Elliott, Lucia Specia, and Jörg Tiedemann. 2020. “Multimodal Machine Translation through Visuals and Speech.” Machine Translation 34(2–3), 97–147.
Toral, Antonio, Sheila Castilho, Ken Hu, and Andy Way. 2018. “Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation.” In Proceedings of the Third Conference on Machine Translation: Research Papers, edited by Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, and Karin Verspoor. 113–123. Accessed June 9, 2021.
Vashee, Kirti. 2021a. “The Quest for Human Parity Machine Translation.” eMpTy Pages. Accessed November 6, 2021. [URL]
. 2021b. “Understanding Machine Translation Quality: A Review.” eMpTy Pages. Accessed November 6, 2021. [URL]
. 2021c. “The Human-in-the-Loop Driving MT Progress.” eMpTy Pages. Accessed November 6, 2021. [URL]
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jacob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” In Advances in Neural Information Processing Systems 30 (NIPS 2017), edited by Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett. 1–11. Accessed June 9, 2021. [URL]
Vieira, Lucas Nunes. 2020. “Machine Translation in the News. A Framing Analysis of the Written Press.” Translation Spaces 9(1), 98–122.
Way, Andy. 2019. “Machine Translation: Where Are We at Today? In The Bloomsbury Companion to Language Industry Studies, edited by Erik Angelone, Maureen Ehrensberger-Dow, and Gary Massey. 311–332. Bloomsbury Academic.
Wu, Yonghui, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation.” arXiv. Accessed June 9, 2021. [URL]
Cited by (11)
Cited by 11 other publications
Chen, Luyu, Lilla Varga & Milad Mehdizadkhani
Fan, Rui & Yue Zhang
Yao, Xiaofang, Yong-Bin Kang & Anthony McCosker
Durr, Margarete
Li, Chen & Zhiyuan Sun
Moorkens, Joss
Zhou, Yanjun & Shuling Zhou
Li, Ruichao, Abdullah Mohd Nawi & Myoung Sook Kang
Yang, Yanxia, Runze Liu, Xingmin Qian & Jiayue Ni
Krüger, Ralph
This list is based on CrossRef data as of 6 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
