In:Translation in Transition: Human and machine intelligence
Edited by Isabel Lacruz
[American Translators Association Scholarly Monograph Series XX] 2023
► pp. 83–103
Chapter 5Recent claims of human-machine parity in translation highlight core
issues surrounding the human evaluation of machine translation
Published online: 26 July 2023
https://doi.org/10.1075/ata.xx.05gil
https://doi.org/10.1075/ata.xx.05gil
Abstract
In 2018, the first claims of empirical backing for
human-machine parity in translation (HMPT) emerged at the WMT18 Conference
on Machine Translation and in a study using WMT resources. Other researchers
quickly refuted these claims, pointing to a flawed human evaluation
campaign. Subsequent HMPT claims at WMT19 were also empirically refuted.
This chapter discusses the evolution of recommendations for human evaluation
of MT stemming from these claims to HMPT and evaluates possibilities of HMPT
at WMT20 in the context of these recommendations. Finally, we summarize all
criteria for human evaluation of MT based on recent literature.
Article outline
- 1.Introduction
- 2.2018: First claims of human-machine parity in translation
- 2.1Human evaluation of MT: Metrics, metrics, metrics
- 2.2Critics of Hassan et al.’s (2018) claims of HMPT
- 3.WMT19: Additional claims of HMPT
or even machine super-performance - 4.WMT20: Continued innovation and greater caution
- 5.Conclusion
Notes References
References (33)
Barrault, Loïc, Magdalena Biesialska, Ondřej Bojar, Marta R. Costa-jussà, Christian Federmann, Yvette Graham, Roman Grundkiewicz, et al. 2020. “Findings
of the 2020 Conference on Machine Translation
(WMT20).” In Proceedings
of the Fifth Conference on Machine
Translation, 1–54. Online: Association for Computational Linguistics. [URL]
Barrault, Loïc, Ondřej Bojar, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, et al. 2019. “Findings
of the 2019 Conference on Machine Translation
(WMT19).” In Proceedings
of the Fourth Conference on Machine Translation (Volume 2: Shared
Task Papers, Day
1), 1–61. Florence, Italy: Association for Computational Linguistics.
Bentivogli, Luisa, Mauro Cettolo, Marcello Federico, and Federmann Christian. 2018. “Machine
Translation Human Evaluation: An Investigation of Evaluation Based
on Post-Editing and Its Relation with Direct
Assessment.” In 15th
International Workshop on Spoken Language Translation
2018, 62–69. [URL]
Bernth, Arendse, and Claudia Gdaniec. 2001. “MTranslatability.” Machine
Translation 16 (3): 175–218.
Bojar, Ondřej, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, et al. 2016. “Findings
of the 2016 Conference on Machine
Translation.” In Proceedings
of the First Conference on Machine Translation: Volume 2, Shared
Task
Papers, 131–198. Berlin: Association for Computational Linguistics.
Bojar, Ondřej, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Philipp Koehn, and Christof Monz. 2018. “Findings
of the 2018 Conference on Machine Translation
(WMT18).” In Proceedings
of the Third Conference on Machine Translation: Shared Task
Papers, 272–303. Brussels: Association for Computational Linguistics.
Callison-Burch, Chris. 2009. “Fast,
Cheap, and Creative: Evaluating Translation Quality Using Amazon’s
Mechanical
Turk.” In Proceedings
of the 2009 Conference on Empirical Methods in Natural Language
Processing, 286–95.
Castilho, Sheila, Joss Moorkens, Federico Gaspari, Andy Way, Panayota Georgakopoulou, Maria Gialama, Vilelmini Sosoni, and Rico Sennrich. 2017. “Crowdsourcing
for NMT Evaluation: Professional Translators versus the
Crowd.” Translating and the
Computer 39. [URL]
Castilho, Sheila, and Sharon O’Brien. 2017. “Acceptability
of Machine-Translated Content: A Multi-Language Evaluation by
Translators and
End-Users.” Linguistica
Antverpiensia, New Series–Themes in Translation
Studies 16.
Castilho, Sheila, Sharon O’Brien, Fabio Alves, and Morgan O’Brien. 2014. “Does
Post-Editing Increase Usability? A Study with Brazilian Portuguese
as Target
Language.” In Proceedings
of the 17th Annual Conference of the European Association for
Machine
Translation, 183–190. Association for Computational Linguistics.
Egdom, G. M. W. van, and Mark Pluymaekers. 2019. “Why
Go the Extra Mile? How Different Degrees of Post-Editing Affect
Perceptions of Texts, Senders and Products among End
Users.” Journal of Specialised
Translation 31: 158–76.
Graham, Yvette, Christian Federmann, Maria Eskevich, and Barry Haddow. 2020a. “Assessing
Human-Parity in Machine Translation on the Segment
Level.” In Findings
of the Association for Computational Linguistics: EMNLP
2020, 4199–4207. Online: Association for Computational Linguistics.
Graham, Yvette, Barry Haddow, and Philipp Koehn. 2020b. “Statistical
Power and Translationese in Machine Translation
Evaluation.” In Proceedings
of the 2020 Conference on Empirical Methods in Natural Language
Processing
(EMNLP), 72–81. Online: Association for Computational Linguistics. [URL].
Grimaila, Annette, and John Chandioux. 1992. “Made
to Measure
Solutions.” In Computers
in Translation: A Practical Appraisal, ed.
by John Newton, 33–45. London: Routledge.
Grundkiewicz, Roman, Marcin Junczys-Dowmunt, Christian Federmann, and Tom Kocmi. 2021. “On
User Interfaces for Large-Scale Document-Level Human Evaluation of
Machine Translation
Outputs.” In Proceedings
of the Workshop on Human Evaluation of NLP Systems
(HumEval). Online from Kyiv, Ukraine. [URL]
Hassan, Hany, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, et al. 2018. “Achieving
Human Parity on Automatic Chinese to English News
Translation.” [URL]
Jääskeläinen, Riitta. 2010. “Are
All Professionals Experts? Definitions of Expertise and
Reinterpretation of Research
Evidence.” Translation and
Cognition 15: 213–27.
Kocmi, Tom, Tomasz Limisiewicz, and Gabriel Stanovsky. 2020. “Gender
Coreference and Bias Evaluation at WMT
2020.” In Proceedings
of the Fifth Conference on Machine
Translation. Online: Association for Computational Linguistics.
Läubli, Samuel, Sheila Castilho, Graham Neubig, Rico Sennrich, Qinlan Shen, and Antonio Toral. 2020. “A
Set of Recommendations for Assessing Human-Machine Parity in
Language Translation.” Journal of
Artificial Intelligence
Research 67 (March).
Läubli, Samuel, Rico Sennrich, and Martin Volk. 2018. “Has
Machine Translation Achieved Human Parity? A Case for Document-Level
Evaluation.” ArXiv Preprint
ArXiv:1808.07048.
Ng, Nathan, Kyra Yee, Alexei Baevski, Myle Ott, Michael Auli, and Sergey Edunov. 2019. “Facebook
FAIR’s WMT19 News Translation Task
Submission.” In Proceedings
of the Fourth Conference on Machine
Translation. Florence, Italy: Association for Computational Linguistics. [URL].
O’Brien, Sharon. 2013. “The
Borrowers: Researching the Cognitive Aspects of
Translation.” Target. International
Journal of Translation
Studies 25 (1): 5–17.
Popel, Martin. 2018. “CUNI
Transformer Neural MT System for
WMT18.” In Proceedings
of the Third Conference on Machine Translation: Shared Task
Papers, 482–87. Belgium, Brussels: Association for Computational Linguistics.
Popel, Martin, Marketa Tomkova, Jakub Tomek, Łukasz Kaiser, Jakob Uszkoreit, Ondřej Bojar, and Zdeněk Žabokrtský. 2020. “Transforming
Machine Translation: A Deep Learning System Reaches News Translation
Quality Comparable to Human
Professionals.” Nature
Communications 11 (1): 4381.
Scarton, Carolina, Mikel L. Forcada, Miquel Esplà-Gomis, and Lucia Specia. 2019. “Estimating
Post-Editing Effort: A Study on Human Judgements, Task-Based and
Reference-Based Metrics of MT
Quality.” In Zenodo
16th International Workshop on Spoken Language
Translation. Hong Kong. [URL]
Shreve, Gregory M., and Erik Angelone, eds. 2010. Translation
and Cognition. American Translators
Association Scholarly Monograph Series, v.
15. Amsterdam; Philadelphia: John Benjamins Pub. Co..
Snover, Matthew, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. “A
Study of Translation Edit Rate with Targeted Human
Annotation.” In Proceedings
of the 7th Conference of the Association for Machine Translation of
the
Americas, 223–31. Cambridge, Massachusetts: Association for Machine Translation in the Americas. [URL]
Tomolonis, Tommy. 2020. Discussion
with Tommy Tomolonis, Automation Technology Specialist at
Morningside Translations.
Toral, Antonio. 2020. “Reassessing
Claims of Human Parity and Super-Human Performance in Machine
Translation at WMT
2019.” ArXiv:2005.05738
[Cs], May. [URL]
Toral, Antonio, Sheila Castilho, Ke Hu, and Andy Way. 2018. “Attaining
the Unattainable? Reassessing Claims of Human Parity in Neural
Machine Translation.” ArXiv Preprint
ArXiv:1808.10432. [URL].
Toral, Antonio, and Andy Way. 2018. “What
Level of Quality Can Neural Machine Translation Attain on Literary
Text?” In Translation
Quality
Assessment, 263–87. Springer. [URL].
Cited by (1)
Cited by one other publication
This list is based on CrossRef data as of 31 march 2026. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
