Source language classification of indirect translations

Ivaska, Ilmari; Ivaska, Laura

doi:10.1075/target.00006.iva

Article published In: What can research on indirect translation do for Translation Studies?:
Edited by Hanna Pięta, Laura Ivaska and Yves Gambier
[Target 34:3] 2022
► pp. 370–394

Get fulltext from our e-platform

Download PDF

Download EPUB

Source language classification of indirect translations

Ilmari Ivaska | University of Turku

Laura Ivaska | University of Turku | Finnish Literature Society (SKS)

Published online: 11 April 2022

https://doi.org/10.1075/target.00006.iva

Abstract

One of the major barriers to the systematic study of indirect translation – that is, translations of translations – is the lack of efficient methods to identify these translations. In this article, we use supervised machine learning to examine whether computers can be harnessed to identify indirect translations. Our data consist of a monolingual comparable corpus that includes (1) nontranslated Finnish texts, (2) direct translations from English, French, German, Greek, and Swedish into Finnish, and (3) indirect translations from Greek (the ultimate source language) via English, French, German, and Swedish (mediating languages) into Finnish. We use n-grams of various types and lengths as feature sets and random forests as the statistical classification technique. To maximize the transferability of the method, the feature sets were implemented in accordance with the Universal Dependencies framework. This study confirms that computers can distinguish between translated and nontranslated Finnish, as well as between Finnish translations made from different source languages. Regarding indirect translations, the ultimate source language has a greater impact on the linguistic composition of indirect Finnish translations than their respective mediating languages. Hence, the indirect translations could not be reliably identified. Therefore, our results suggest that the reliable computational identification of indirect translations and their mediating languages requires a way to control for the effect of the ultimate source language.

Keywords: indirect translation, literary translation, supervised machine learning, source language identification, corpus-based translation studies, Finnish

Article outline

1.Introduction
2.Related work
- 2.1Distinguishing translations and nontranslations
- 2.2Classifying direct translations according to their source language
- 2.3Classifying indirect translations according to their linguistic features
3.Data and methods
- 3.1Data description and preprocessing
- 3.2Feature sets
  - 3.2.1Sequential n-grams
  - 3.2.2Dependency 2-grams
  - 3.2.3Positional 1-grams
  - 3.2.4Character 3-grams
- 3.3Experimental setup and statistical evaluation
4.Results
- 4.1Distinguishing translated and nontranslated Finnish
- 4.2Classifying Finnish translations based on their source languages
- 4.3Classifying indirect Greek–Finnish translations according to their mediating languages
5.Conclusions
Notes
References

References (43)

References

Assis Rosa, Alexandra, Hanna Pięta, and Rita Bueno Maia. 2017. “Theoretical, Methodological and Terminological Issues Regarding Indirect Translation: An Overview.” Translation Studies 10 (2): 113–132.

Baker, Mona. 1993. “Corpus Linguistics and Translation Studies – Implications and Applications.” In Text and Technology: In Honour of John Sinclair, edited by Mona Baker, Gill Francis, and Elena Tognini-Bonelli, 233–250. Amsterdam: John Benjamins.

Baroni, Marco, and Silvia Bernardini. 2006. “A New Approach to the Study of Translationese: Machine-Learning the Difference between Original and Translated Text.” Literary and Linguistic Computing 21 (3): 259–274.

Breiman, Leo. 2001. “Random Forests.” Machine Learning 45 (1): 5–32.

Cartoni, Bruno, Sandrine Zufferey, and Thomas Meyer. 2013. “Using the Europarl Corpus for Cross-Linguistic Research.” Belgian Journal of Linguistics 27 (1): 23–42.

Čermák, František, and Alexandr Rosen. 2012. “The Case of InterCorp: A Multilingual Parallel Corpus.” International Journal of Corpus Linguistics 17 (3): 411–427.

Fernández Muñiz, Iris. 2016. “Tracking Sources in Indirect Translation Archaeology: A Case Study on a 1917 Spanish Translation of Ibsen’s Et Dukkehjem (1879).” In New Horizons in Translation Research and Education 41, edited by Turo Rautaoja, Tamara Mikolič Južnič, and Kaisa Koskinen, 115–132. Joensuu: University of Eastern Finland.

Genette, Gérard. 1991. “Introduction to the Paratext.” New Literary History 22 (2): 261–272.

Hanes, Vanessa Lopes Lourenço. 2017. “Between Continents: Agatha Christie’s Translations as Intercultural Mediators.” Cadernos de Tradução 37 (1): 208–229.

Islam, Zahurul, and Armin Hoenen. 2013. “Source and Translation Classification Using Most Frequent Words.” In Proceedings of the Sixth International Joint Conference on Natural Language Processing, Nagoya, Japan, 14–18 October 2013, edited by Ruslan Mitkov and Jong C. Park, 1299–1305. Nagoya: Asian Federation of Natural Language Processing.

Ivaska, Ilmari, and Silvia Bernardini. 2020. “Constrained Language Use in Finnish: A Corpus-Driven Approach.” Nordic Journal of Linguistics 43 (1): 33–57.

Ivaska, Laura. 2019. “Distinguishing Translations from Non-translations and Identifying (In)direct Translations’ Source Languages.” In Proceedings of the Research Data and Humanities (RDHum) 2019 Conference: Data, Methods and Tools, edited by Jarmo Harri Jantunen, Sisko Brunni, Niina Kunnas, Santeri Palviainen, and Katja Västi. Studia humaniora ouluensia 17, 125–138. Oulu: University of Oulu.

. 2020. “Identifying (Indirect) Translations and Their Source Languages in the Finnish National Bibliography Fennica: Problems and Solutions.” In MikaEL 131: 75–88.

. 2021. “The Genesis of a Compilative Translation and its de facto Source Text.” In Genetic Translation Studies: Conflict and Collaboration in Liminal Spaces, edited by Ariadne Nunes, Joana Moura, and Marta Pacheco Pinto, 72–88. London: Bloomsbury.

Kanerva, Jenna, Filip Ginter, Niko Miekka, Akseli Leino, and Tapio Salakoski. 2018. “Turku Neural Parser Pipeline: An End-to-End System for the CoNLL 2018 Shared Task.” In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, edited by Daniel Zeman and Jan Hajič, 133–142. Brussels: Association for Computational Linguistics.

Kazantzakis, Nikos. 1946. Βίος και πολιτεία του Αλέξη Ζορμπά [Life and times of Alexis Zorbas]. Athens: Dimitrakou.

. 1952. Zorba the Greek. Translated by Carl Wildman. New York: Simon and Schuster.

. 1954a. Alexis Zorba. Translated by Yvonne Gauthier, Gisèle Prassinos, and Pierre Fridas. Paris: Plon.

. 1954b. Kerro minulle, Zorbas [Tell me, Zorbas]. Translated by Vappu Roos. Helsinki: Tammi.

. 1963. Οι Αδερφοφάδες [The fratricides]. Athens: Unknown.

. 1964. The Fratricides. Translated by Athena Gianakas Dallas. New York: Simon and Schuster.

. 1965. Les frères ennemis [The enemy brothers]. Translated by Pierre Aellig. Paris: Plon.

. 1967. Veljesviha [Hatred of brothers]. Translated by Kyllikkki Villa. Helsinki: Tammi.

Koehn, Philipp. 2005. “Europarl: A Parallel Corpus for Statistical Machine Translation.” In Proceedings of Machine Translation Summit X: Papers, 79–86. Phuket: Association for Computational Linguistics.

Koppel, Moshe, and Noam Ordan. 2011. “Translationese and its Dialects.” In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Volume 1, edited by Dekang Lin, 1318–1326. Portland: Association for Computational Linguistics.

Lynch, Gerard, and Carl Vogel. 2012. “Towards the Automatic Detection of the Source Language of a Literary Translation.” In Proceedings of COLING 2012: Posters, edited by Martin Kay and Christian Boitet, 775–784. Mumbai: The COLING 2012 Organizing Committee.

Mauranen, Anna. 2004. “Corpora, Universals and Interference.” In Translation Universals: Do They Exist? edited by Anna Mauranen and Pekka Kujamäki, 65–82. Amsterdam: John Benjamins.

Meyer, David, Evgenia Dimitriadou, Kurt Hornik, Andreas Weingessel, and Friedrich Leisch. 2021. E1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071). TU Wien.

Nisioi, Sergiu. 2015. “Unsupervised Classification of Translated Texts.” In Natural Language Processing and Information Systems, edited by Chris Biemann, Siegfried Handschuh, André Freitas, Farid Meziane, and Elisabeth Métais, 323–334. Cham: Springer.

Nivre, Joakim, Marie-Catherine de Marneffe, Filip Ginter, Jan Hajič, Christopher D. Manning, Sampo Pyysalo, Sebastian Schuster, Francis Tyers, and Daniel Zeman. 2020. “Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection.” In Proceedings of 12th Conference on Language Resources and Evaluation LREC’2020, edited by Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck et al., 4034–4043. Marseille: European Language Resources Association.

Popescu, Marius. 2011. “Studying Translationese at the Character Level.” In Proceedings of the International Conference Recent Advances in Natural Language Processing 2011, edited by Ruslan Mitkov and Galia Angelova, 634–639. Hissar: Association for Computational Linguistics.

R Core Team. 2021. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.

Rabinovich, Ella, Sergiu Nisioi, Noam Ordan, and Shuly Wintner. 2016. “On the Similarities between Native, Non-Native and Translated Texts.” In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, edited by Katrin Erk and Noah A. Smith, 1870–1881. Berlin: Association for Computational Linguistics.

Rabinovich, Ella, Noam Ordan, and Shuly Wintner. 2017. “Found in Translation: Reconstructing Phylogenetic Language Trees from Translations.” In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, edited by Regina Barzilay and Min-Yen Kan, 530–540. Vancouver: Association for Computational Linguistics.

Rabinovich, Ella, and Shuly Wintner. 2015. “Unsupervised Identification of Translationese.” Transactions of the Association for Computational Linguistics 31: 419–432.

Toury, Gideon. 2012. Descriptive Translation Studies – and Beyond. Amsterdam: John Benjamins.

Ustaszewski, Michael. 2021. “Towards a Machine Learning Approach to the Analysis of Indirect Translation.” Translation Studies 14 (3): 313–331.

Volansky, Vered, Noam Ordan, and Shuly Wintner. 2015. “On the Features of Translationese.” Digital Scholarship in the Humanities 30 (1): 98–118.

Washbourne, Kelly. 2013. “Nonlinear Narratives: Paths of Indirect and Relay Translation.” Meta 58 (3): 607–625.

Wright, Marvin N., and Andreas Ziegler. 2017. “ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R.” Journal of Statistical Software 77 (1): 1–17.

Zei, Alki. 1971. Ο μεγάλος περίπατος του Πέτρου [Petros’ long journey]. Athens: Kedros.

. 1972. Petros’ War. Translated by Edward Fenton. New York: E. P. Dutton.

. 1973. Tämä on sotaa, Petros [This is war, Petros]. Translated by Marikki Makkonen. Porvoo: WSOY.

Cited by (5)

Cited by five other publications

Order by:

Ivaska, Ilmari, Mirva Johnson & Tommi Kurki

2025. Identifying the dialectal background of American Finnish speakers using a supervised machine-learning model. Nordic Journal of Linguistics 48:1 ► pp. 32 ff.

Leila Yu. Mirzoyeva, Aigul K. Zhumabekova, Raikul A. Dosmakhanova & Kanat O. Azhiev

2025. Russian Language and Culture in Indirect Translation from English into Kazakh. RUDN Journal of Language Studies, Semiotics and Semantics 16:3 ► pp. 945 ff.

St. André, James

2025. Exploring pseudotranslation style using a three-way comparable corpus. Translation and Interpreting Studies 20:1 ► pp. 1 ff.

Ivaska, Laura, Hanna Pięta & Yves Gambier

2023. Past, present and future trends in (research on) indirect literary translation. Perspectives 31:5 ► pp. 775 ff.

Pięta, Hanna, Laura Ivaska & Yves Gambier

2023. Structured literature review of published research on indirect translation (2017–2022). Perspectives 31:5 ► pp. 839 ff.

This list is based on CrossRef data as of 4 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.