Article published In: From Disruptions to New Beginnings: The evolution of translation (studies) through technologies
Edited by Federico Gaspari and Silvia Bernardini
[Translation and Translanguaging in Multilingual Contexts 11:3] 2025
► pp. 334–360
Navigating translation equivalence in news corpora
Construction and analysis of a comparallel Greek-English corpus on migration
Published online: 19 August 2025
https://doi.org/10.1075/ttmc.00171.fer
https://doi.org/10.1075/ttmc.00171.fer
Abstract
This article introduces a method to identify and classify translation equivalences in multilingual news texts and
applies it to the task of creating a corpus for the study of news translation, a notably challenging area within Translation
Studies. The dataset is composed of 41 Greek-English news dispatches on the topic of migration by AMNA, the Greek national news
agency. Conceptually, we build on previous research on ‘comparallel’ corpus architectures, which bring together features of
comparable and parallel corpora and provide the necessary flexibility to account for the non-prototypical translated data
characterizing multilingual news. The automated method uses state-of-the art Natural Language Processing techniques, namely
sentence and word embeddings, which make it possible to account for nuanced translation relationships, distinguishing between
translated, partially translated, related, and unrelated sentence pairs. We test the method against a benchmark of manually
annotated sentences from the AMNA dataset and provide examples of correctly and incorrectly classified sentence pairs. We finally
build a fully-fledged comparallel corpus based on the dataset and present a case study demonstrating how the corpus can be
leveraged for corpus-assisted studies of news discourse, and most notably to investigate newsworthiness and ideological shifts
occurring in multilingual news.
Article outline
- 1.Introduction
- 2.Related work
- 2.1News translation
- 2.2Corpora of non-prototypical translation data
- 3.Method
- 4.Results
- 4.1The four-way classification of translation relationships
- 4.2Performance evaluation
- 4.3Error analysis
- 5.Case study: Using a comparallel corpus to investigate newsworthiness and shifts
- 6.Discussion and conclusion
- Acknowledgements
- Notes
References
References (36)
Artetxe, Mikel, and Holger Schwenk. 2019. “Margin-Based
Parallel Corpus Mining with Multilingual Sentence
Embeddings.” In Proceedings of the 57th Annual Meeting of the
Association for Computational
Linguistics, 3197–3203. Association for Computational Linguistics.
Barrón-Cedeño, Alberto, Cristina España-Bonet, Josu Boldoba, and Lluís Màrquez. 2015. “A
Factory of Comparable Corpora from
Wikipedia.” In BUCC@ACL/IJCNLP 3–13. Association for Computational Linguistics.
Bassnett, Susan. 2005. “Bringing
the News Back Home: Strategies of Acculturation and Foreignisation.” Language and Intercultural
Communication 5 (2): 120–130.
Baumann, Gerd, Marie Gillespie, and Annabelle Sreberny. 2011. “Transcultural
Journalism and the Politics of Translation: Interrogating the BBC World
Service.” Journalism 12 (2): 135–142.
Bednarek, Monika, and Helen Caple. 2017. The
Discourse of News Values: How News Organizations Create
Newsworthiness. Oxford: Oxford University Press.
Bernardini, Silvia, Adriano Ferraresi, Federico Garcea, and Natalia Rodriguez-Blanco. 2024. “Corpus
Approaches to News Translation: We Can Do Better Than Comparable!” Across Languages and
Cultures 25 (2): 198–215.
Bernardini, Silvia, Sara Castagnoli, Adriano Ferraresi, Federico Gaspari, and Eros Zanchetta. 2010. “Introducing
Comparapedia: A New Resource for Corpus-Based Translation Studies.” Paper Presented at
the UCCTS 2010 Conference, Edgehill University,
UK.
Brook, Johnathan. 2012. The
Role of Translation in the Production of International Print News. Three Case Studies in the Language Direction Spanish to
English. PhD diss. University of Auckland.
Caimotto, Maria Cristina, and Federico Gaspari. 2018. “Corpus-Based
Study of News Translation: Challenges and Possibilities.” Across Languages and
Cultures 19 (2): 205–220.
Carpenter, John C., and Sujatha Sosale. 2019. “The
Role of Language in a Journalistic Interpretive Community.” Journalism
Practice 13 (3): 280–297.
Davier, Lucile, and Luc van Doorslaer. 2018. “Translation without
a Source Text: Methodological Issues in News Translation.” Across Languages and
Cultures 19 (2): 241–257.
Davier, Lucile. 2014. “The
Paradoxical Invisibility of Translation in the Highly Multilingual Context of News
Agencies.” Global Media and
Communication 10 (1): 53–72.
. 2021. “Translation
in the News Agencies.” In The Routledge Handbook of Translation and
Media, ed. by Esperança Bielsa, 183–198. London: Routledge.
. 2022. “Translating
News.” In the Cambridge Handbook of
Translation, ed. by Kirsten Malmkjær, 401–420. Cambridge: Cambridge University Press.
Federmann, Christian, Tom Kocmi, and Ying Xin. 2022. “NTREX-128
— News Test References for MT Evaluation of 128
Languages.” In Proceedings of the First Workshop on Scaling up
Multilingual Evaluation, 21–24. Association for Computational Linguistics.
Feng, Fangxiaoyu, Yinfei Yang, Daniel Cer, Naveen Arivazhagan, and Wei Wang. 2022. “Language-Agnostic
BERT Sentence Embedding.” In Proceedings of the 60th Annual Meeting
of the Association for Computational
Linguistics, 878–891. Association for Computational Linguistics.
Gaspari, Federico. 2013. “A
Phraseological Comparison of International News Agency Reports Published Online: Lexical Bundles in the English-Language
Output of ANSA, Adnkronos, Reuters and UPI.” Varieng. Studies in Variation, Contacts and Change
in English 13 (1). [URL]
. 2015. “Exploring
Expo Milano 2015: A Cross-Linguistic Comparison of Food-Related Phraseology in Translation Using a Comparallel Corpus
Approach.” The
Translator 21 (3): 327–349.
Hernández-Guerrero, María José. 2022. “The Translation of
Multimedia News Stories: Rewriting the Digital
Narrative.” Journalism 23 (7): 1488–1508.
Holland, Robert. 2013. “News
Translation.” In The Routledge Handbook of Translation
Studies, ed. by Carmen Millán, and Francesca Bartrina, 332–346. London: Routledge.
Kontos, Petros, and Maria Sidiropoulou. 2012. “Socio-Political
Narratives in Translated English-Greek News Headlines.” Intercultural
Pragmatics 9 (2): 195–224.
Liu, Siyou, Longyue Wang, and Chao-Hong Liu. 2018. “Chinese-Portuguese
Machine Translation: A Study on Building Parallel Corpora from Comparable
Texts.” In Proceedings of the Eleventh International Conference on
Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA).
Panou, Despoina. 2014. Idiom
Translation in the Financial Press: A Corpus-Based Study. Newcastle upon Tyne: Cambridge Scholars Publishing.
Pęzik, Piotr, and Łukasz Grabowski. 2023. “Towards
a Near-Parallel Corpus of News Texts: An Experiment in Using Multilingual Sentence
Embeddings.” Paper presented at the PACOR 2023
Conference, University of León, Spain.
Rodriguez-Blanco, Natalia. 2024a. “Distance
and Closeness in Translated Global News Coverage: Bilingual Representations of Culture-bound themes from Bolivia to the
World.” Perspectives 1–19.
. 2024b. Translational
and Discursive Processes in Multilingual News Production by Global News Agencies: Representations about
Bolivia. PhD diss., University of Bologna.
Schäffner, Christina. 2010. Political
Discourse, Media and Translation. Newcastle upon Tyne: Cambridge Scholars Publishing.
Sharjeel, Muhammad, Iqra Muneer, Sumaira Nosheen, Rao Nawab, Adeel Muhammad, and Paul Rayson. 2023. “Cross-lingual
Text Reuse Detection at Document Level for English-Urdu Language Pair.” ACM Transactions on
Asian and Low-Resource Language Information
Processing 22 (6): 173:1–173:22.
Sidiropoulou, Maria. 2020. “Introduction:
Im/Politeness and Theatre Translation.” Translation and Translanguaging in Multilingual
Contexts 6 (1): 1–8.
Valdeón, Roberto A. 2015. “Fifteen Years of Journalistic
Translation Research and
More.” Perspectives 23 (4): 634–662.
2020. “On the Interface between
Journalism and Translation Studies: A Historical Overview and Suggestions for Collaborative
Research.” Journalism
Studies 21 (12): 1644–1661.
Vamvas, Jannis, and Rico Sennrich. 2022. “NMTScore:
A Multilingual Analysis of Translation-Based Text Similarity
Measures.” In Findings of the Association for Computational
Linguistics: EMNLP 2022, 198–213. Association for Computational Linguistics.
Van Doorslaer, Luc. 2010. “The
Double Extension of Translation in the Journalistic Field.” Across Languages and
Cultures 11 (2): 175–188.
Wołk, Krzysztof, Emilia Rejmund, and Krzysztof Marasek. 2015. “Harvesting
Comparable Corpora and Mining Them for Equivalent Bilingual Sentences Using Statistical Classification and Analogy-Based
Heuristics.” In International Symposium on Methodologies for
Intelligent Systems (ISMIS 2015), ed. by Floriana Esposito, Olivier Pivert, Mohand-Saïd Hacid, Zbigniew Rás, and Stefano Ferilli, 433–441. Berlin: Springer International Publishing.
