Article published In: Intelligences pour la traduction. IA et interculturel : actions et interactions.
Edited by Ludovica Maggi and Sarah Bordes
[FORUM 20:2] 2022
► pp. 315–332
Evaluer, diagnostiquer et analyser la traduction automatique neuronale
Article language: French
Published online: 12 January 2023
https://doi.org/10.1075/forum.00023.yvo
https://doi.org/10.1075/forum.00023.yvo
Résumé
Les outils de traduction automatique (TA) neuronale ont fait des progrès sensibles, qui qui les rendent utilisables pour un nombre croissant de domaines et de couples de langues. Cette évolution majeure des technologies de
traduction invite à revisiter les méthodes de mesure de la qualité de la traduction, en particulier des mesures dites
automatiques, qui jouent un rôle fondamental pour orienter les nouveaux développements de ces systèmes. Dans cet article, nous
dressons un état des lieux des méthodes utilisées dans le cycle de développement des outils de traduction automatique, depuis les
évaluations purement quantitatives jusqu’aux méthodologies récemment proposées pour analyser et diagnostiquer le fonctionnement de
ces “boites noires” neuronales.
Abstract
Neural machine translation (MT) technologies have made significant progress, making them useful for an
increasing number of domains and language pairs. These major developments of translation technologies invite us to revisit our
methods for measuring translation quality, in particular the so-called “automatic metrics”, which play a fundamental role in
guiding the new developments of MT systems. In this work, we review the methods used in the development cycle of machine
translation tools, from purely quantitative evaluations to recently proposed methodologies aiming to analyse and diagnose the
functioning of these neural “black boxes”.
Article outline
- 1.Introduction
- 2.La TA Neuronale : Principes et concepts
- 2.1Traduire par apprentissage
- 2.2TAN : Traduction automatique numérique
- 2.3Configurer Aθ: Le choix des méta-paramètres
- 3.Métriques automatiques : Le rôle des références humaines
- 3.1Les évaluations globales
- 3.2Évaluer sans référence
- 4.À la recherche des failles de la TA
- 4.1Des bancs d’essais spécialisés
- 4.2Évaluation par des manipulations linguistiques
- 4.2.1Dans la phrase source
- 4.2.2Dans la phrase cible
- 5.Sous le capot, le moteur (de traduction)
- 5.1Analyse des représentations (sondes linguistiques)
- 6.Conclusion
- Remarques
Bibliographie
References (51)
Bahdanau, Dzmitry, Kyunghyun Cho, et Yoshua Bengio. 2015. “Neural
Machine Translation by Jointly Learning to Align and
Translate.” In Proceedings of the First International Conference on
Learning Representations. San Diego, CA.
Banerjee, Satanjeev et Alon Lavie. 2005. “METEOR:
An Automatic Metric for MT Evaluation with Improved Correlation with Human
Judgments.” In Proceedings of the ACL Workshop on Intrinsic and
Extrinsic Evaluation Measures for Machine
Translation, 65–72. Ann Arbor, Michigan.
Bawden, Rachel, Rico Sennrich, Alexandra Birch et Barry Haddow. 2018. “Evaluating
Discourse Phenomena in Neural Machine Translation.” In Proceedings of
the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language
Technologies, Volume 1 (Long Papers), 1304–13. New Orleans, Louisiana.
Belinkov, Yonatan et Yonatan Bisk. 2018. “Synthetic
and Natural Noise Both Break Neural Machine
Translation.” In International Conference on Learning
Representations.
Belinkov, Yonatan et James Glass. 2019. “Analysis
Methods in Neural Language Processing: A Survey.” Transactions of the Association for
Computational
Linguistics 71 (April): 49–72.
Blanchon, Hervé, and Christian Boitet. 2007. “Pour
l’évaluation Externe Des Systèmes de TA Par Des méthodes Fondées Sur La tâche.” Traitement
Automatique Des
Langues 481: 33–65.
Burchardt, Aljoscha, Vivien Macketanz, Jon Dehdari, Georg Heigold, Jan-Thorsten Peter, et Philip Williams. 2017. “A
Linguistic Evaluation of Rule-Based, Phrase-Based, and Neural MT Engines.” The Prague Bulletin
of Mathematical
Linguistics 1081: 159–70.
Burlot, Franck, et François Yvon. 2017. “Evaluating
the Morphological Competence of Machine Translation
Systems.” In Proceedings of the Second Conference on Machine
Translation, Volume 1: Research
Papers, 43–55. Copenhagen, Denmark.
. 2018. “Evaluation
morphologique pour la traduction automatique: adaptation au
français.” In Conférence sur le Traitement Automatique des Langues
Naturelles, 14 pages. TALN. Rennes, France.
Castilho, Sheila, Stephen Doherty, Federico Gaspari, and Joss Moorkens. 2018. “Approaches
to Human and Machine Translation Quality Assessment.” In Translation
Quality
Assessment, 9–38. Springer.
Chatzikoumi, Eirini. 2020. “How
to Evaluate Machine Translation: A Review of Automated and Human Metrics.” Natural Language
Engineering 26 (2): 137–61.
Cho, Kyunghyun, Bart van Merrienboer, Dzmitry Bahdanau, et Yoshua Bengio. 2014. “On
the Properties of Neural Machine Translation: Encoder-Decoder
Approaches.” In Proceedings of SSST-8, Eighth Workshop on Syntax,
Semantics and Structure in Statistical
Translation, 103–11. Doha, Qatar.
Conneau, Alexis, German Kruszewski, Guillaume Lample, Loı̈c Barrault, and Marco Baroni. 2018. “What
You Can Cram into a Single $&!#* Vector: Probing Sentence Embeddings for Linguistic
Properties.” In Proceedings of the 56th Annual Meeting of the
Association for Computational Linguistics (Volume 1: Long
Papers), 2126–36. Melbourne, Australia.
Forcada, Mikel L., Carolina Scarton, Lucia Specia, Barry Haddow, and Alexandra Birch. 2018. “Exploring
Gap Filling as a Cheaper Alternative to Reading Comprehension Questionnaires When Evaluating Machine Translation for
Gisting.” In Proceedings of the Third Conference on Machine
Translation: Research Papers, 192–203. Brussels, Belgium.
Freitag, Markus, George Foster, David Grangier, Viresh Ratnakar, Qijun Tan, et Wolfgang Macherey. 2021. “Experts,
Errors, and Context: A Large-Scale Study of Human Evaluation for Machine
Translation.” Transactions of the Association for Computational
Linguistics 91: 1460–74.
Gehring, Jonas, Michael Auli, David Grangier, Denis Yarats, et Yann N. Dauphin. 2017. “Convolutional
Sequence to Sequence Learning.” In Proceedings of the 34th
International Conference on Machine Learning, edited by D. Precup and Y. W. Teh, 701:1243–52. Sydney, Australia.[URL]
Giulianelli, Mario, Jack Harding, Florian Mohnert, Dieuwke Hupkes, et Willem Zuidema. 2018. “Under
the Hood: Using Diagnostic Classifiers to Investigate and Improve How Language Models Track Agreement
Information.” In Proceedings of the 2018 EMNLP Workshop BlackboxNLP:
Analyzing and Interpreting Neural Networks for
NLP, 240–48. Brussels, Belgium.
Guillou, Liane, and Christian Hardmeier. 2016. “PROTEST:
A Test Suite for Evaluating Pronouns in Machine
Translation.” In Proceedings of the Tenth International Conference on
Language Resources and Evaluation
(LREC’16), 636–43. Portorož, Slovenia.
Guillou, Liane, Christian Hardmeier, Preslav Nakov, Sara Stymne, Jörg Tiedemann, Yannick Versley, Mauro Cettolo, Bonnie Webber, and Andrei Popescu-Belis. 2016. “Findings
of the 2016 WMT Shared Task on Cross-Lingual Pronoun
Prediction.” In Proceedings of the First Conference on Machine
Translation: Volume 2, Shared Task
Papers, 525–42. Berlin, Germany.
Hardmeier, Christian, Preslav Nakov, Sara Stymne, Jörg Tiedemann, Yannick Versley et Mauro Cettolo. 2015. “Pronoun-Focused
MT and Cross-Lingual Pronoun Prediction: Findings of the 2015 DiscoMT Shared Task on Pronoun
Translation.” In Proceedings of the Second Workshop on Discourse in
Machine Translation, 1–16. Lisbon, Portugal.
Hewitt, John et Percy Liang. 2019. “Designing
and Interpreting Probes with Control Tasks.” In Proceedings of the
2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural
Language Processing (EMNLP-IJCNLP), 2733–43. Hong Kong, China.
Hovy, Eduard, Margaret King et Andrei Popescu-Belis. 2002. “Principles
of Context-Based Machine Translation Evaluation.” Machine
Translation 17 (1): 43–75.
Isabelle, Pierre, Colin Cherry, et George Foster. 2017. “A
Challenge Set Approach to Evaluating Machine
Translation.” In Proceedings of the 2017 Conference on Empirical
Methods in Natural Language
Processing, 2486–96. Copenhagen, Denmark.
King, Margaret et Kirsten Falkedal. 1990. “Using
Test Suites in Evaluation of Machine Translation Systems.” In Papers
Presented to the 13th International Conference on Computational Linguistics. COLING
1990.
Krubiński, Mateusz, Erfan Ghadery, Marie-Francine Moens, and Pavel Pecina. 2021. “Just
Ask! Evaluating Machine Translation by Asking and Answering
Questions.” In Proceedings of the Sixth Conference on Machine
Translation, 495–506. Online.
Kübler, Natalie. 2008. “A
Comparable Learner Translator Corpus: Creation and Use.” In Proc. Of
LREC 2008 Workshop on Building and Using Comparable
Corpora, 73–78. BUCC. Marrakech, Morocco.
Läubli, Samuel, Sheila Castilho, Graham Neubig, Rico Sennrich, Qinlan Shen, and Antonio Toral. 2020. “A
Set of Recommendations for Assessing Human-Machine Parity in Language Translation.” Journal of
Artificial Intelligence
Review 671: 653–72.
Lommel, Arle, Hans Uszkoreit, and Aljoscha Burchardt. 2014. “Multidimensional
Quality Metrics (MQM): A Framework for Declaring and Describing Translation Quality
Metrics.” Revista Tradumàtica: Tecnologies de La
Traducció, no. 12: 455–63.
Maruf, Sameen, Fahimeh Saleh, and Gholamreza Haffari. 2021. “A
Survey on Document-Level Neural Machine Translation: Methods and Evaluation.” ACM Comput.
Surv. 54 (2).
Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. “BLEU:
A Method for Automatic Evaluation of Machine
Translation.” In Proceedings of the 40th Annual Meeting on
Association for Computational Linguistics, 311–18. ACL
’02. Stroudsburg, PA, USA.
Pierce, John R., John B. Carroll, Eric P. Hamp, David G. Hays, Charles F. Hockett, Anthony G. Oettinger, and Alan Perlis. 1966. “Language
and Machines – Computers in Translation and Linguistics.” Washington, DC: ALPAC Report, National Academy of Sciences.
Raganato, Alessandro, Yves Scherrer, and Jörg Tiedemann. 2019. “The
MuCoW Test Suite at WMT 2019: Automatically Harvested Multilingual Contrastive Word Sense Disambiguation Test Sets for Machine
Translation.” In Proceedings of the Fourth Conference on Machine
Translation (Volume 2: Shared Task Papers, Day
1), 470–80. Florence, Italy.
Rei, Ricardo, Craig Stewart, Ana C. Farinha, and Alon Lavie. 2020. “COMET:
A Neural Framework for MT Evaluation.” In Proceedings of the 2020
Conference on Empirical Methods in Natural Language Processing
(EMNLP), 2685–2702. Online.
Rios, Annette, Mathias Müller, and Rico Sennrich. 2018. “The
Word Sense Disambiguation Test Suite at WMT18.” In Proceedings of the
Third Conference on Machine Translation: Shared Task
Papers, 588–96. Belgium, Brussels.
Rudin, Cynthia. 2019. “Stop
Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models
Instead.” Nature Machine
Intelligence 1 (5): 206–15.
Saunders, Danielle, and Bill Byrne. 2020. “Reducing
Gender Bias in Neural Machine Translation as a Domain Adaptation
Problem.” In of the 58th Annual Meeting of the Association for
Computational
Linguistics, 7724–36. Online.
Scarton, Carolina, and Lucia Specia. 2016. “A
Reading Comprehension Corpus for Machine Translation
Evaluation.” In Proceedings of the Tenth International Conference on
Language Resources and Evaluation
(LREC’16), 3652–58. Portorož, Slovenia.
Sennrich, Rico. 2017. “How
Grammatical Is Character-Level Neural Machine Translation? Assessing MT Quality with Contrastive Translation
Pairs.” In Proceedings of the 15th Conference of the European Chapter
of the Association for Computational Linguistics: Volume 2, Short
Papers, 376–82. Valencia, Spain.
Shi, Xing, Inkit Padhi, and Kevin Knight. 2016. “Does
String-Based Neural MT Learn Source Syntax?” In Proceedings of the
2016 Conference on Empirical Methods in Natural Language
Processing, 1526–34. Austin, Texas.
Snover, Matthew, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, et John Makhoul. 2006. “A
Study of Translation Edit Rate with Targeted Human
Annotation.” In Proceedings of the Seventh Conference of the
Association for Machine Translation in the America
(AMTA), 223–31. Boston, Massachusetts, USA.
Specia, Lucia, Carolina Scarton, et Gustavo Henrique Paetzold. 2018. Quality
Estimation for Machine Translation. Synthesis Lectures on Human Language
Technologies. Morgan & Claypool Publishers.
Thompson, Brian et Matt Post. 2020. “Automatic
Machine Translation Evaluation in Many Languages via Zero-Shot
Paraphrasing.” In Proceedings of the 2020 Conference on Empirical
Methods in Natural Language Processing
(EMNLP), 90–121. Online.
Vanmassenhove, Eva, Jinhua Du, and Andy Way. 2017. “Investigating
‘Aspect’ in NMT and SMT: Translating the English Simple Past and Present
Perfect.” Computational Linguistics in the Netherlands
Journal 71: 109–28.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, et Illia Polosukhin. 2017. “Attention
Is All You Need.” In Advances in Neural Information Processing
Systems 301, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 5998–6008.
Vig, Jesse, Sebastian Gehrmann, Yonatan Belinkov, Sharon Qian, Daniel Nevo, Yaron Singer, and Stuart Shieber. Investigating gender bias in language models using causal mediation analysis. In NeurIPS, volume 331, pages 12388–12401. Curran Associates, Inc., 2020.
Voita, Elena and Ivan Titov. Information-theoretic probing with minimum description length. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 183–196, Online, November 2020. Association for Computational Linguistics.
Voita, Elena, Rico Sennrich, and Ivan Titov. 2019. “When
a Good Translation Is Wrong in Context: Context-Aware Machine Translation Improves on Deixis, Ellipsis, and Lexical
Cohesion.” In Proceedings of the 57th Annual Meeting of the
Association for Computational
Linguistics, 1198–1212. Florence, Italy.
Wisniewski, Guillaume, Lichao Zhou, Nicolas Ballier, et François Yvon. 2021. “Biais
de genre dans un système de traduction automatique neuronale : une étude
préliminaire.” In Traitement Automatique des Langues
Naturelles, edité by P. Denis, N. Grabar, A. Fraisse, R. Cardon, B. Jacquemin, E. Kergosien, and A. Balvet, 11–25. Lille, France.
