Article In: Digital Translation: Online-First Articles
Machine translation for language access in government settings
A comparative study of LLMs, NMTs, and human translations of vital documents
This content is being prepared for publication; it may be subject to changes.
Abstract
Recent legislative mandates have expanded language access in government services, yet research related to the
integration of machine translation (MT) remains limited. This study evaluates the quality and efficiency of Neural Machine
Translation (NMT) systems (DeepL, Google Translate) and Large Language Models (LLMs) (GPT-4) in translating government-based legal
documents from English to Spanish. Methodologically, the study involved twenty-seven professional translators who conducted human
translation (HT), machine translation post-editing (MTPE), and quality evaluation. Translation quality was measured using an
adapted Multidimensional Quality Metrics (MQM) framework, while technical and temporal post-editing efforts were measured via
keylogging software. The findings indicate that (a) MTPE significantly reduces translation completion time compared to HT; (b)
GPT-4, an LLM, achieves higher overall quality scores than traditional NMT engines, including DeepL and Google Translate; and (c)
MTPE and HT perform similarly in overall quality. The study underscores the potential of LLM-based translation technologies,
combined with professional human post-editing, as a high-quality and efficient solution to meeting growing language access demands
in government contexts. These findings offer critical insights for policymakers and translation professionals in public
services.
Article outline
- 1.Introduction
- 2.Related work
- 2.1Language access policy
- 2.2Machine translation in the legal sector
- 2.3Post-editing of MT
- 2.4MT quality assessment
- 3.Methods
- 3.1Research design
- 3.2Participants
- 3.3Materials
- 3.4Procedures
- 3.4.1Text selection
- 3.3.2Translation process
- 3.3.3Human annotation of translation quality
- 3.4Statistical analyses
- 4.Results
- 4.1Translation quality: Absolute penalty total
- 4.2Translation quality: Overall quality score
- 4.3Technical effort
- 4.4Temporal effort
- 5.Discussion
- 6.Conclusion
- Notes
- Author queries
References
References (25)
Alvarez-Vidal, Sergi, and Antoni Oliver. 2023. “Assessing
MT with Measures of PE
Effort.” Ampersand 111: 100125.
Bajčić, Martina, and Dejana Golenko. 2024. “Applying
Large Language Models in Legal Translation: The State of the Art.” International Journal of
Language and
Law 131: 171–196.
Bowker, Lynne. 2025. “Machine
Translation Literacy.” In The Palgrave Encyclopedia of
Computer-Assisted Language Learning, edited by L. McCallum and D. Tafazoli. Cham: Palgrave Macmillan.
Briva-Iglesias, Vicent, João Luis Camargo, and Gokhan Dogru. 2024. “Large
Language Models ‘Ad Referendum’: How Good Are They at Machine Translation in the Legal
Domain?” MonTI. Monografías de Traducción e
Interpretación 161: 75–107.
Briva-Iglesias, Vicent, Benjamin R. Cowan, and Sharon O’Brien. 2023. “The
Impact of Traditional and Interactive Post-Editing on Machine Translation User Experience, Quality, and
Productivity.” Translation, Cognition &
Behavior 61: 58–84.
Briva-Iglesias, Vicent. 2021. “Traducción
Humana vs. Traducción Automática: Análisis Contrastivo e Implicaciones para la Aplicación de la Traducción Automática en
Traducción Jurídica.” Mutatis
Mutandis 141: 571–600.
Carpuat, Marine, Omri Asscher, Kalika Bali, Luisa Bentivogli, Frédéric Blain, Lynne Bowker, Monojit Choudhury, Hal Daumé III, Kevin Duh, Ge Gao, Alvin Grissom II, Marzena Karpinska, Elaine Khoong, William Lewis, André Martins, Mary Nurminen, Douglas Oard, Maja Popović, Michel Simard, and François Yvon. 2025. An
Interdisciplinary Approach to Human-Centered Machine
Translation. arXiv.
Castaldo, Antonio, Sheila Castilho, Joss Moorkens, and Johanna Monti. 2025. “Extending
CREAMT: Leveraging Large Language Models for Literary Translation
Post-Editing.” arXiv.
Chichirau, Malina, Rik van Noord, and Antonio Toral. 2023. “Automatic
Discrimination of Human and Neural Machine Translation in Multilingual
Scenarios.” In Proceedings of the 24th Annual Conference of the
European Association for Machine
Translation, 217–226. Tampere, Finland: European Association for Machine Translation.
Cui, Ying, Xiao Liu, and Yuqin Cheng. 2023. “A
Comparative Study on the Effort of Human Translation and Post-Editing in Relation to Text Types: An Eye-Tracking and
Key-Logging Experiment.” SAGE Open 131.
Drugan, Joanna, Ingemar Strandvik, and Erkka Vuorinen. 2018. “Translation
Quality, Quality Management and Agency: Principles and Practice in the European Union
Institutions.” In Translation Quality Assessment: From Principles to
Practice, edited by Joss Moorkens, Sheila Castilho, Federico Gaspari, and Stephen Doherty, 39–68. Cham: Springer International Publishing.
Freitag, Markus, George Foster, David Grangier, Viresh Ratnakar, Qijun Tan, and Wolfgang Macherey. 2021. “Experts,
Errors, and Context: A Large-Scale Study of Human Evaluation for Machine
Translation.” Transactions of the Association for Computational
Linguistics 91: 1460–1474.
Freitag, Markus, Nitika Mathur, Daniel Deutsch, Chi-Kiu Lo, Eleftherios Avramidis, Ricardo Rei, and Alon Lavie. 2024. “Are
LLMs Breaking MT Metrics? Results of the WMT24 Metrics Shared
Task.” In Proceedings of the Ninth Conference on Machine
Translation, 47–81.
Giampieri, Patrizia. 2023a. Legal
Machine Translation Explained: MT in Legal Contexts. Newcastle upon Tyne: Cambridge Scholars Publishing.
. 2023b. “Is
Machine Translation Reliable in the Legal Field? A Corpus-Based Critical Comparative Analysis for Teaching ESP at Tertiary
Level.” ESP
Today 11 (1): 119–137.
Guerberof Arenas, Ana, and Joss Moorkens. 2019. “Machine
Translation and Post-Editing Training as Part of a Master’s Programme.” The Journal of
Specialised Translation: 217–238.
Hajek, John, Anthony Pym, Yu Hao, Maria Karidakis, Ambrin Hasnain, Anila Hasnain, Juerong Qiu, Ke Hu, and Rachel Macreadie. 2024. “Understanding
and Improving Machine Translations for Emergency Communications.”
Kenny, Dorothy. 2022. “Human
and Machine Translation.” In Machine Translation for Everyone:
Empowering Users in the Age of Artificial
Intelligence, 18–23.
Krings, Hans P. 2001. Repairing Texts: Empirical
Investigations of Machine Translation Post-Editing Processes. Kent, OH: Kent State University Press.
Lankford, Séamus, Haithem Afli, and Andy Way. 2023. “adaptMLLM:
Fine-Tuning Multilingual Language Models on Low-Resource Languages with Integrated LLM
Playgrounds.” Information 141: 638.
Larroyed, Aline. 2023. “Redefining
Patent Translation: The Influence of ChatGPT and the Urgency to Align Patent Language Regimes in Europe with Progress in
Translation Technology.” GRUR
International 72 (11): 1009–1017.