In:Recent Advances in Multiword Units in Machine Translation and Translation Technology
Edited by Johanna Monti, Gloria Corpas Pastor, Ruslan Mitkov and Carlos Manuel Hidalgo-Ternero
[Current Issues in Linguistic Theory 366] 2024
► pp. 2–17
Chapter 1Multi-word units in neural machine translation
Why the tip of the iceberg remains problematic
Published online: 7 November 2024
https://doi.org/10.1075/cilt.366.01col
https://doi.org/10.1075/cilt.366.01col
Abstract
Neural machine translation (NMT) has recently made significant progress in improving the quality of
the texts it produces. New features of NMT include the fluidity of translations and the successful handling of
multi-word units. In this paper we first report the results of an automated evaluation of the percentage of
phraseology in the translations produced by Google Translate and DeepL. A corpus-based approach makes it possible to
estimate that both NMT systems succeed in producing an average percentage of phraseology that is quite reasonable and
sometimes even higher than in natural language production by native speakers. However, a closer look at some
problematic cases shows that the ability of NMT systems to treat phraseological units can be deceptive, as they are
often unable to cope with contextual complexity and low-frequency idioms.
Article outline
- 1.Introduction: Lingering doubts about neural machine translation
- 2.Are texts produced by NMT rich in phraseology? An experiment
- 3.Looking closer at problematic examples for NMT
- 4.Fine-tuning NMT for phraseology: An experiment
- 5.Conclusion
Notes References Appendix
References (17)
Barreiro, A., Monti, J., Batista, F., & Orliac, B. (2013). When
multiword go bad in machine translation. Proceedings of the workshop on
multi-word units in machine translation and translation technologies, 14th Machine Translation Summit, Nice.
Burger, A., Dobrovol’skij, D., Kühn, P., & Norrick, N. (Eds.). (2007). Phraseologie / Phraseology. Ein internationales Handbuch der zeitgenössischen Forschung / An
International Handbook of Contemporary Research. De Gruyter.
Clark, K., Luong, M. -T., Le, Q. V., & Manning, C. D. (2020). Electra:
Pre-training text encoders as discriminators rather than generators. ICLR
2020, (pp. 1–18).
Colson, J. -P. (2017). The
IdiomSearch experiment: Extracting phraseology from a probabilistic network of
constructions. In R. Mitkov (Ed.), Computational
and Corpus-based phraseology, Lecture Notes in Artificial Intelligence
10596. Springer International Publishing, Cham (pp. 16–28).
(2018). From
Chinese word segmentation to extraction of constructions: Two sides of the same algorithmic
coin. Proceedings of the Joint Workshop on Linguistic Annotation,
Multiword Expressions and Constructions
(LAW-MWE‑CxG-2018), Association for Computational Linguistics (pp. 41–50).
(2020). HMSid
and HMSid2 at PARSEME Shared Task 2020: Computational corpus linguistics and unseen-in-training
MWEs. Coling 2020 – Proceedings of the Joint Workshop on Multiword Expressions
and Electronic Lexicons. Association for Computational Linguistics.
Croft, W. (2001). Radical
construction grammar: Syntactic theory in typological perspective. Oxford University Press.
Denkowski, M., & Lavie, A. (2014). Meteor
Universal: Language specific translation evaluation for any target
language. Proceedings of the EACL 2014 Workshop on Statistical Machine
Translation (pp. 376–380).
Dupal, J. (2018). Investigating
the Phrasicon of CLIL and NON-CLIL students: A corpus-based comparative analysis using
IdiomSearch. Thesis, Université catholique de
Louvain, Louvain-la-Neuve.
Hoffmann, Th., & Trousdale, G. (Eds.). (2013). The
Oxford Handbook of Construction Grammar. Oxford University Press.
Isabelle, P., Cherry, C., & Foster, G. (2017). A
Challenge Set approach to evaluating machine
translation. Proceedings of the 2017 Conference on Empirical
Methods in Natural Language
Processing (pp. 2486–2496).
Loock, R. (2018). Traduction automatique et usage linguistique : une analyse de traductions anglais-français
réunies en corpus. Meta, Journal des
traducteurs, 63, 786–806.
Papineni, K., Roukos, S., Ward, T. et al. (2002). Bleu:
A method for automatic evaluation of machine translation. Proceedings of
40th Annual Meeting of the Association for Computational
Linguistics (pp. 311–318).
