In:Corpus Use in Cross-linguistic Research: Paving the way for teaching, translation and professional communication
Edited by Marlén Izquierdo and Zuriñe Sanz-Villar
[Studies in Corpus Linguistics 113] 2023
► pp. 179–194
Chapter 10Domain-adapting and evaluating machine translation for institutional German in South Tyrol
Published online: 2 November 2023
https://doi.org/10.1075/scl.113.10con
https://doi.org/10.1075/scl.113.10con
Building on a prior small-scale study on machine translation adaptation in the same language pair and domain (De Camillis 2021), this chapter reports on a (i) domain-adaptation, (ii) quality assessment, and (iii)
automatic legal terminology evaluation experiment for legal South Tyrolean German. After collecting, sentence-aligning, and cleaning the
LEXB parallel corpus, we used this bilingual resource to domain-adapt a ModernMT engine. Performance improvements were measured in terms
of automatic quality metrics. The machine-translation of South Tyrolean legal terms was evaluated using an ad hoc automatic terminology evaluation tool. We observed a
significant boost in performance and term accuracy in the output of the ModernMT adapted engine, but the improvement in legal terminology
translation was deemed unsatisfactory.
Article outline
- 1.Introduction
- 2.German in South Tyrol
- 3.Previous experiments on MT and legal German
- 4.Methodology
- 4.1Corpus building
- 4.2MT adaptation and testing
- 4.3Automatic terminology evaluation
- 5.Results
- 6.Conclusions
Notes References
References (27)
Alam, Md Mahfuz ibn, Anastasopoulos, Antonios, Besacier, Laurent, Cross, James, Gallé, Matthias, Koehn, Philipp & Nikoulina, Vassilina. 2021. On the evaluation of machine translation for terminology consistency. ArXiv:2106.11891 [Cs]. [URL]
Beller, Manfredi & Leerssen, Joep (Eds). Imagology: The Cultural Construction and Literary Representation of National Characters, Studia Imagologica Vol. 13, Amsterdam & New York: Rodopi, 2007
Bertoldi, Nicola, Caroselli, Davide & Federico, Marcello. 2018. The ModernMT project. In Proceedings of the 21st Annual Conference of the European Association for Machine Translation, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Miquel Esplà-Gomis, Maja Popović et al. (eds), 345. Alicante: European Association for Machine Translation. <[URL]> (30 May 2023).
Callison-Burch, Chris, Osborne, Miles & Koehn, Philipp. 2006. Re-evaluating the role of Bleu in machine translation research. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, Diana McCarthy & Shuly Wintner (eds), 249–256 Trento: Association for Computational Linguistics.
Chiocchetti, Elena, Kranebitter, Klara, Ralli, Natascia & Stanizzi, Isabella. 2013. Deutsch ist nicht gleich Deutsch. Eine terminologische Analyse zu den Besonderheiten der deutschen Rechtssprache in Südtirol. In Diatopische Variation in der deutschen Rechtssprache, Marina Marzia Brambilla, Joachim Gerdes, & Chiara Messina (eds), 253–285. Berlin: Frank & Timme. <[URL]> (30 May 2023).
Chiocchetti, Elena. 2019a. Legal comparison in terminology work: Developing the South Tyrolean German legal language. In Diszciplínák találkozása: Nyelvi közvetítés a XXI. században, Szilvia Szoták (ed.), 175–185. Budapest: OFFI.
. 2019b. Terminology work in South Tyrol: New approaches, new termbase, new contents. Terminologija 26: 6–23.
Chiocchetti, Elena, Ralli, Natascia & Stanizzi, Isabella. 2017. From DIY translations to official standardisation and back again? 50 years of experience with Italian and German legal terminology work in South Tyrol. In Terms and Terminology in the European Context, Paola Faini (ed.), 254–270. Newcastle upon Tyne: Cambridge Scholars.
Contarino, Antonio Giovanni. 2021. Neural Machine Translation Adaptation and Automatic Terminology Evaluation: A Case Study on Italian and South Tyrolean German Legal Texts. Master’s dissertation, University of Bologna. <[URL]> (30 May 2023).
De Camillis, Flavia. 2021. La traduzione non professionale nelle istituzioni pubbliche dei Territori di lingua minoritaria: Il caso di studio dell’amministrazione della Provincia Autonoma di Bolzano. PhD dissertation, University of Bologna. <[URL]> (30 May 2023).
Dougal, Duane K. & Lonsdale, Deryle. 2020. Improving NMT quality using terminology injection. In Proceedings of the 12th Language Resources and Evaluation Conference, 4820–4827. Marseille: European Language Resources Association. <[URL]> (30 May 2023).
DPR 670/1972. Decreto del Presidente della Repubblica 31 agosto 1972, n. 670 “Approvazione del testo unico delle leggi costituzionali concernenti lo statuto speciale per il Trentino – Alto Adige”. <[URL]> (30 May 2023).
Farajian, M. Amin, Bertoldi, Nicola, Negri, Matteo, Turchi, Marco & Federico, Marcello. 2018. Evaluation of terminology translation in instance-based neural MT adaptation. In Proceedings of the 21st Annual Conference of the European Association for Machine Translation, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Miquel Esplà-Gomis, Maja Popović et al. (eds), 149–158. Alicante: European Association for Machine Translation. <[URL]> (30 May 2023).
Haque, Rejwanul, Hassanuzzman, Mohammed & Way, Andy. 2019. Terminology Translation in Low-Resource Scenarios. Information 10 (9), 273, 1–28.
Heiss, Christine & Soffritti, Marcello. 2018. DeepL traduttore e didattica della traduzione dall’italiano in tedesco. In Translation And Interpreting for Language Learners (TAIL). Special issue of inTRAlinea, 1–11. <[URL]> (30 May 2023).
Koehn, Philipp. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Dekang Lin & Dekai Wu (eds), 388–395. Barcelona: Association for Computational Linguistics. <[URL]> (30 May 2023).
Koehn, Philipp & Knowles, Rebecca. 2017. Six challenges for neural machine translation. In Proceedings of the First Workshop on Neural Machine Translation, Thang Luong, Alexandra Birch, Graham Neubig & Andrew Finch (eds), 28–39. Vancouver: Association for Computational Linguistics. <[URL]> (30 May 2023).
Marie, Benjamin, Fujita, Atsushi & Rubino, Raphael. 2021. Scientific credibility of machine translation research: A meta-evaluation of 769 papers. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Vol. 1: Long Papers, 7297–7306. Stroudsburg PA: Association for Computational Linguistics. (30 May 2023)
Papineni, Kishore, Roukos, Salim, Ward, Todd & Zhu, Wei-Jing. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Pierre Isabelle, Eugene Charniak & Dekang Lin (eds), 311–318. Stroudsburg PA: Association for Computational Linguistics.
Popović, Maja. 2015. ChrF: Character n-gram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, Ondřej Bojar, Rajan Chatterjee, Christian Federmann, Barry Haddow et al. (eds), 392–395. Lisbon: Association for Computational Linguistics.
Post, Matt. 2018. A call for clarity in reporting BLEU scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel et al. (eds), 186–191. Brussels: Association for Computational Linguistics
Ralli, Natascia & Andreatta, Norbert. 2018. bistro – ein Tool für mehrsprachige Rechtsterminologie. trans-kom 11(1):7–44. <[URL]> (30 May 2023)
Saunders, Danielle. 2021. Domain adaptation and multi-domain adaptation for neural machine translation: A survey. CoRR abs/2104.06951. <[URL]> (30 May 2023).
van der Wees, Marlies. 2017. What’s in a Domain?: Towards Fine-grained Adaptation for Machine Translation. PhD dissertation, University of Amsterdam. <[URL]> (30 May 2023).
Varga, Daniel, Halacsy, Peter, Kornai, Andras, Nagy, Viktor, Nemeth, Laszlo & Tron, Viktor. 2007. Parallel corpora for medium density languages. In Recent Advances in Natural Language Processing IV: Selected papers from RANLP 2005 [Current Issues in Linguistic Theory 292], Nicolas Nicolov, Kalina Bontcheva, Galia Angelova & Ruslan Mitkov (eds), 247–258. Amsterdam: John Benjamins.
