Chapter 3. Evaluating the Italian-English machine translation quality of MWUs in the domain of archaeology

Speranza, Giulia; Monti, Johanna

doi:10.1075/cilt.366.03spe

In:Recent Advances in Multiword Units in Machine Translation and Translation Technology
Edited by Johanna Monti, Gloria Corpas Pastor, Ruslan Mitkov and Carlos Manuel Hidalgo-Ternero
[Current Issues in Linguistic Theory 366] 2024
► pp. 40–56

Get fulltext from our e-platform

Download Book PDF

Download Book EPUB

Chapter 3
Evaluating the Italian-English machine translation quality of MWUs in the domain of archaeology

Giulia Speranza | University of Naples “L’Orientale”, UNIOR NLP Research Group

Johanna Monti | University of Naples “L’Orientale”, UNIOR NLP Research Group

Published online: 7 November 2024

https://doi.org/10.1075/cilt.366.03spe

Abstract

Multiword units (MWUs) represent a challenging and problematic linguistic issue in the field of Natural Language Processing (NLP) due to their idiosyncratic nature. This paper investigates the quality of Neural Machine Translation (NMT) outputs when dealing with MWUs in the domain of archaeology. As a case study, a dataset of 100 MWUs is used as a Gold Standard to evaluate out-of-context and in-context translation outputs from three state-of-the-art NMT systems for the Italian-English language pair: Google Translate, DeepL, and Microsoft Bing Translator. MT outputs are manually evaluated with reference to the Gold Standard, namely out-of-context and in-context human English translations of the selected 100 MWUs. Results show that terminology is still a problematic category for MT quality and that MWUs translation may vary, and sometimes even improve, when further context is provided.

Keywords: multiword units, terminology, machine translation, evaluation, error analysis, archaeology

Article outline

1.Introduction
2.Related work
- 2.1Terminology translation
- 2.2Terminology translation evaluation
3.Experimental setup
- 3.1The dataset
- 3.2The evaluation methodology
- 3.3Global evaluation
- 3.4Local error analysis
4.Conclusions
Notes
References

References (26)

References

Arcan, M., Torregrosa, D., & Buitelaar, P. (2017). Translating terminological expressions in knowledge bases with neural machine translation. arXiv preprint arXiv:1709.02184.

Arcan, M., Turchi, M., Tonelli, S., & Buitelaar, P. (2014). Enhancing statistical machine translation with bilingual terminology in a CAT environment. Proceedings of the 11th Biennial Conference of the Association for Machine Translation in the Americas (AMTA 2014) (pp. 54–68). Association for Machine Translation in the Americas.

Chatterjee, R., Negri, M., Turchi, M., Federico, M., Specia, L., & Blain, F. (2017, September). Guiding neural machine translation decoding with external knowledge. Proceedings of the Second Conference on Machine Translation (pp. 157–168).

Chen, L. H., & Kageura, K. (2019). Translating terminologies: A comparative examination of NMT and PBSMT systems. Proceedings of Machine Translation Summit XVII Volume 2: Translator, Project and User Tracks (pp. 101–108).

Dinu, G., Mathur, P., Federico, M., & Al-Onaizan, Y. (2019). Training neural machine translation to apply terminology constraints. arXiv preprint arXiv:1906.01105.

Fadaee, M., Bisazza, A., & Monz, C. (2017). Data augmentation for low-resource neural machine translation. arXiv preprint arXiv:1705.00440.

Farajian, M. A., Bertoldi, N., Negri, M., Turchi, M., & Federico, M. (2018). Evaluation of terminology translation in instance-based neural MT adaptation. Proceedings of the 21st Annual Conference of the European Association for Machine Translation (EAMT 2018).

Haque, R., Hasanuzzaman, M., & Way, A. (2019a). Investigating terminology translation in statistical and neural machine translation: A case study on English-to-Hindi and Hindi-to-English. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019) (pp. 437–446).

(2019b). TermEval: An automatic metric for evaluating terminology translation in MT. The 20th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing 2019), La Rochelle, France.

Hassan, H., Aue, A., Chen, C., Chowdhary, V., Clark, J., Federmann, C., Huang, X., Junczys-Dowmunt, M., Lewis, W., Li, M., Liu, S., Liu, T. Y, Luo, R., Menezes, R., Qin, T., Seide, F., Tan, X., Tian, F., Wu. L., Wu S., Xia, Y., Zhang, D., Zhang, Z., Zhou, Z., (2018). Achieving human parity on automatic Chinese to English news translation. arXiv preprint arXiv:1803.05567.

Hayakawa, T., & Arase, Y. (2020). Fine-Grained error analysis on English-to-Japanese machine translation in the medical domain. Proceedings of the 22nd Annual Conference of the European Association for Machine Translation (pp. 155–164).

Isabelle, P., Cherry, C., & Foster, G. (2017). A Challenge Set Approach to Evaluating Machine Translation. arXiv preprint arXiv:1704.07431.

Lommel, A., & Melby, A. K. (2018). Tutorial: MQM-DQF: A good marriage (Translation quality for the 21st Century). Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 2: User Papers).

Macketanz, V., Avramidis, E., Burchardt, A., Helcl, J., & Srivastava, A. (2017). Machine translation: Phrase-Based, rule-based and neural approaches with linguistic evaluation. Cybernetics and Information Technologies, 17(2), 28–43.

Michon, Elise, Josep Crego, & Jean Senellart (2020). Integrating domain terminology into neural machine translation. Proceedings of the 28th International Conference on Computational Linguistics. Barcelona, Spain (Online): International Committee on Computational Linguistics (pp. 3925–3937).

Monti, J., Barreiro, A., Elia, A., Marano, F., & Napoli, A. (2011). Taking on new challenges in multi-word unit processing for machine translation. Second International Workshop on Free/Open-Source Rule-Based Machine Translation (pp. 11–19). UOC. EDU.

Monti, J., Barreiro, A., Oroliac, B., & Batista, F. (2013). When multiwords go bad in machine translation. Machine Translation Summit XIV (pp. 26–33). The European Association for Machine Translation.

Monti, J., Mitkov, R., Pastor, G. C., & Seretan, V. (Eds.). (2018). Multiword units in machine translation and translation technology (Vol. 341). John Benjamins Publishing Company.

Peng, W., Huang, C., Li, T., Chen, Y., & Liu, Q. (2020). Dictionary-Based data augmentation for cross-domain neural machine translation. arXiv preprint arXiv:2004.02577.

Ren, Z., Lü, Y., Cao, J., Liu, Q., & Huang, Y. (2009). Improving statistical machine translation using domain bilingual multiword expressions. Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications (MWE 2009) (pp. 47–54).

Rikters, M., & Bojar, O. (2017). Paying attention to multi-word expressions in neural machine translation. arXiv preprint arXiv:1710.06313.

Sag, I. A., Baldwin, T., Bond, F., Copestake, A., & Flickinger, D. (2002). Multiword expressions: A pain in the neck for NLP. In International Conference on Intelligent Text Processing and Computational Linguistics (pp. 1–15). Springer.

Scansani, R., Bentivogli, L., Bernardini, S., & Ferraresi, A. (2019). MAGMATic: A multi-domain academic gold standard with manual annotation of terminology for machine translation evaluation. Proceedings of Machine Translation Summit XVII Volume 1: Research Track (pp. 78–86).

Thompson, B., Knowles, R., Zhang, X., Khayrallah, H., Duh, K., & Koehn, P. (2019). HABLex: Human annotated bilingual lexicons for experiments in machine translation. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 1382–1387).

Vintar, Ŝ. (2018). Terminology translation accuracy in statistical versus neural MT: An evaluation for the English-Slovene language pair. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).

Zaninello, A., & Birch, A. (2020,). Multiword expression aware neural machine translation. Proceedings of The 12th Language Resources and Evaluation Conference (pp. 3816–3825).

Chapter 3Evaluating the Italian-English machine translation quality of MWUs in the domain of archaeology

Chapter 3
Evaluating the Italian-English machine translation quality of MWUs in the domain of archaeology