Machine translation for language access in government settings: A comparative study of LLMs, NMTs, and human translations of vital documents

Rodríguez, Stephanie A.; Jiménez-Crespo, Miguel A.

doi:10.1075/dt.25018.rod

Article In: Digital Translation: Online-First Articles

Machine translation for language access in government settings

A comparative study of LLMs, NMTs, and human translations of vital documents

Stephanie A. Rodríguez | Rutgers University

Miguel A. Jiménez-Crespo | Rutgers University

This content is being prepared for publication; it may be subject to changes.

Abstract

Recent legislative mandates have expanded language access in government services, yet research related to the integration of machine translation (MT) remains limited. This study evaluates the quality and efficiency of Neural Machine Translation (NMT) systems (DeepL, Google Translate) and Large Language Models (LLMs) (GPT-4) in translating government-based legal documents from English to Spanish. Methodologically, the study involved twenty-seven professional translators who conducted human translation (HT), machine translation post-editing (MTPE), and quality evaluation. Translation quality was measured using an adapted Multidimensional Quality Metrics (MQM) framework, while technical and temporal post-editing efforts were measured via keylogging software. The findings indicate that (a) MTPE significantly reduces translation completion time compared to HT; (b) GPT-4, an LLM, achieves higher overall quality scores than traditional NMT engines, including DeepL and Google Translate; and (c) MTPE and HT perform similarly in overall quality. The study underscores the potential of LLM-based translation technologies, combined with professional human post-editing, as a high-quality and efficient solution to meeting growing language access demands in government contexts. These findings offer critical insights for policymakers and translation professionals in public services.

Keywords: language access, translation technology, machine translation, translation quality assessment, human annotation, post-editing effort

Article outline

1.Introduction
2.Related work
- 2.1Language access policy
- 2.2Machine translation in the legal sector
- 2.3Post-editing of MT
- 2.4MT quality assessment
3.Methods
- 3.1Research design
- 3.2Participants
- 3.3Materials
- 3.4Procedures
  - 3.4.1Text selection
  - 3.3.2Translation process
  - 3.3.3Human annotation of translation quality
- 3.4Statistical analyses
4.Results
- 4.1Translation quality: Absolute penalty total
- 4.2Translation quality: Overall quality score
- 4.3Technical effort
- 4.4Temporal effort
5.Discussion
6.Conclusion
Notes
Author queries
References

References (25)

References

Alvarez-Vidal, Sergi, and Antoni Oliver. 2023. “Assessing MT with Measures of PE Effort.” Ampersand 111: 100125.

Bajčić, Martina, and Dejana Golenko. 2024. “Applying Large Language Models in Legal Translation: The State of the Art.” International Journal of Language and Law 131: 171–196.

Bowker, Lynne. 2025. “Machine Translation Literacy.” In The Palgrave Encyclopedia of Computer-Assisted Language Learning, edited by L. McCallum and D. Tafazoli. Cham: Palgrave Macmillan.

Briva-Iglesias, Vicent, João Luis Camargo, and Gokhan Dogru. 2024. “Large Language Models ‘Ad Referendum’: How Good Are They at Machine Translation in the Legal Domain?” MonTI. Monografías de Traducción e Interpretación 161: 75–107.

Briva-Iglesias, Vicent, Benjamin R. Cowan, and Sharon O’Brien. 2023. “The Impact of Traditional and Interactive Post-Editing on Machine Translation User Experience, Quality, and Productivity.” Translation, Cognition & Behavior 61: 58–84.

Briva-Iglesias, Vicent. 2021. “Traducción Humana vs. Traducción Automática: Análisis Contrastivo e Implicaciones para la Aplicación de la Traducción Automática en Traducción Jurídica.” Mutatis Mutandis 141: 571–600.

Carpuat, Marine, Omri Asscher, Kalika Bali, Luisa Bentivogli, Frédéric Blain, Lynne Bowker, Monojit Choudhury, Hal Daumé III, Kevin Duh, Ge Gao, Alvin Grissom II, Marzena Karpinska, Elaine Khoong, William Lewis, André Martins, Mary Nurminen, Douglas Oard, Maja Popović, Michel Simard, and François Yvon. 2025. An Interdisciplinary Approach to Human-Centered Machine Translation. arXiv.

Castaldo, Antonio, Sheila Castilho, Joss Moorkens, and Johanna Monti. 2025. “Extending CREAMT: Leveraging Large Language Models for Literary Translation Post-Editing.” arXiv.

Chichirau, Malina, Rik van Noord, and Antonio Toral. 2023. “Automatic Discrimination of Human and Neural Machine Translation in Multilingual Scenarios.” In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, 217–226. Tampere, Finland: European Association for Machine Translation.

Cui, Ying, Xiao Liu, and Yuqin Cheng. 2023. “A Comparative Study on the Effort of Human Translation and Post-Editing in Relation to Text Types: An Eye-Tracking and Key-Logging Experiment.” SAGE Open 131.

Drugan, Joanna, Ingemar Strandvik, and Erkka Vuorinen. 2018. “Translation Quality, Quality Management and Agency: Principles and Practice in the European Union Institutions.” In Translation Quality Assessment: From Principles to Practice, edited by Joss Moorkens, Sheila Castilho, Federico Gaspari, and Stephen Doherty, 39–68. Cham: Springer International Publishing.

Freitag, Markus, George Foster, David Grangier, Viresh Ratnakar, Qijun Tan, and Wolfgang Macherey. 2021. “Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation.” Transactions of the Association for Computational Linguistics 91: 1460–1474.

Freitag, Markus, Nitika Mathur, Daniel Deutsch, Chi-Kiu Lo, Eleftherios Avramidis, Ricardo Rei, and Alon Lavie. 2024. “Are LLMs Breaking MT Metrics? Results of the WMT24 Metrics Shared Task.” In Proceedings of the Ninth Conference on Machine Translation, 47–81.

Giampieri, Patrizia. 2023a. Legal Machine Translation Explained: MT in Legal Contexts. Newcastle upon Tyne: Cambridge Scholars Publishing.

. 2023b. “Is Machine Translation Reliable in the Legal Field? A Corpus-Based Critical Comparative Analysis for Teaching ESP at Tertiary Level.” ESP Today 11 (1): 119–137.

. 2025. “Assessing the Quality of AI and MT in Legal Translation.” Altre Modernità: 143–159.

Guerberof Arenas, Ana, and Joss Moorkens. 2019. “Machine Translation and Post-Editing Training as Part of a Master’s Programme.” The Journal of Specialised Translation: 217–238.

Hajek, John, Anthony Pym, Yu Hao, Maria Karidakis, Ambrin Hasnain, Anila Hasnain, Juerong Qiu, Ke Hu, and Rachel Macreadie. 2024. “Understanding and Improving Machine Translations for Emergency Communications.”

Jakobsen, Arnt Lykke. 2017. “Translation Process Research.”

Kenny, Dorothy. 2022. “Human and Machine Translation.” In Machine Translation for Everyone: Empowering Users in the Age of Artificial Intelligence, 18–23.

Krings, Hans P. 2001. Repairing Texts: Empirical Investigations of Machine Translation Post-Editing Processes. Kent, OH: Kent State University Press.

Lankford, Séamus, Haithem Afli, and Andy Way. 2023. “adaptMLLM: Fine-Tuning Multilingual Language Models on Low-Resource Languages with Integrated LLM Playgrounds.” Information 141: 638.

Larroyed, Aline. 2023. “Redefining Patent Translation: The Influence of ChatGPT and the Urgency to Align Patent Language Regimes in Europe with Progress in Translation Technology.” GRUR International 72 (11): 1009–1017.

Läubli, Samuel, Sheila Castilho, Graham Neubig, Rico Sennrich, Qinlan Shen, and Antonio Toral. 2020. “A Set of Recommendations for Assessing Human-Machine Parity in Language Translation.” arXiv.

Wei, Johnny, and Robin Jia. 2021. “The Statistical Advantage of Automatic NLG Metrics at the System Level.” In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, 6840–6854. Association for Computational Linguistics.