In:Corpora in Translation and Contrastive Research in the Digital Age: Recent advances and explorations
Edited by Julia Lavid-López, Carmen Maíz-Arévalo and Juan Rafael Zamorano-Mansilla
[Benjamins Translation Library 158] 2021
► pp. 307–323
Chapter 12Exploring variation in translation with probabilistic language models
Published online: 8 December 2021
https://doi.org/10.1075/btl.158.12kar
https://doi.org/10.1075/btl.158.12kar
Abstract
While some authors have suggested that translationese fingerprints are universal, others have shown that there
is a fair amount of variation among translations due to source language shining through, translation type or translation mode. In our
work, we attempt to gain empirical insights into variation in translation, focusing here on translation mode (translation vs.
interpreting). Our goal is to discover features of translationese and interpretese that distinguish translated and interpreted output
from comparable original text/speech as well as from each other at different linguistic levels. We use relative entropy
(Kullback-Leibler Divergence) and visualization with word clouds. Our analysis shows differences in typical words between originals
vs. non-originals as well as between translation modes both at lexical and grammatical levels.
Article outline
- 1.Introduction
- 2.Corpus data
- 3.Methods
- 3.1Probabilistic language models and analysis of translation variation
- 3.2Comparing language models by relative entropy
- 4.Analysis and results
- 4.1Translation direction: Originals vs. Translation/Interpreting
- 4.2Translation mode: Translation vs. Interpreting
- 5.Summary and discussion
Acknowledgements Notes References
References (33)
Baker, Mona. 1993. “Corpus
Linguistics and Translation Studies: Implications and
Applications”. In: Text and Technology: In honour of John
Sinclair. Ed. by Mona Baker, Gill Francis, and Elena Tognini-Bonelli. Amsterdam, Netherlands: John Benjamins Publishing Company, pp. 233–252.
Bendazzoli, Claudio, and Annalisa Sandrelli. 2005. “An
Approach to Corpus-Based Interpreting Studies: Developing EPIC (European Parliament Interpreting
Corpus”. MuTra2005 – Challenges of Multidimensional Translation: Conference
Proceedings.
Baroni, Marco, and Silvia Bernardini. 2006. “A
new approach to the study of Translationese: Machine-learning the difference between original and translated
text”. Literary and Linguistic
Computing, 21(3):259–274.
Bernardini, Silvia, Adriana Ferraresi and Maja Miličević. 2016. “From
EPIC to EPTIC – Exploring simplification in interpreting and translation from an intermodal
perspective”. Target 28: 61–86.
Bernardini, Silvia, Adriana Ferraresi, Mariachiara Russo, Camille Collard and Bart Defrancq. 2018. “Building
Interpreting and Intermodal Corpora: A How-to for a Formidable
Task”. In: Making Way in Corpus-based Interpreting
Studies. Ed. by Mariachiara Russo, Claudio Bendazzoli and Bart Defrancq. Singapore: Springer. pp. 21–42.
Chesterman, Andrew. 2004. “Beyond
the particular”. In: Translation Universals – Do they
exist? Ed. by Mauren, Anna and Kujamäki, Pekka. Benjamins Translation Library, 48(vi): 224.
Crocker, Matthew, Vera Demberg and Elke Teich. 2016. “Information
Density and Linguistic Encoding (IDeaL)”. KI – Künstliche
Intelligenz, 30(1): 77–81.
Defrancq, Bart. 2015. “Corpus-based
research into the presumed effects of short
EVS”. In: Interpreting 17.1: 26–45.
Degaetano-Ortlieb, Stefania and Teich, Elke. 2018. “Using
relative entropy for detection and analysis of periods of diachronic linguistic
change”. In: Proceedings of the 2nd Joint SIGHUM Workshop on
Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, COLING
2018, Santa Fe, NM, USA.
Degaetano-Ortlieb, Stefania and Elke Teich. 2019. “Toward
an optimal code for communication: The case of scientific English”. Corpus Linguistics and
Linguistic Theory 2019 aop.
De Sutter, Gert, Isabelle Delaere and Koen Plevoets. 2012. “Lexical
Lectometry in Corpus-Based Translation Studies. Combining Profile-Based Correspondence Analysis and Logistic Regression
Modeling.” In: Quantitative Methods in Translation
Studies. Ed. by Michael Oakes and Meng Ji, pp. 326–346. Amsterdam: John Benjamins.
Fankhauser, Peter, Jörg Knappen and Elke Teich. 2014. “Exploring
and Visualizing Variation in Language Resources”. In: Proceedings of the
Ninth International Conference on Language Resources and Evaluation (LREC’14). Reykjavik, Iceland: European Language Resources Association (ELRA).
Gellerstam, Martin. 1986. “Translationese
in Swedish novels translated from English”. In: Translation Studies in
Scandinavia: Proceedings from the Scandinavian Symposium on Translation Theory (SSOTT). Ed.
by Lars Wollin and Hans Lindquist. Lund, Sweden: CWK Gleerup, pp. 88–95.
Halverson, Sandra. 2003. “The
Cognitive Basis of Translation
Universals.” Target 15(2): 197–241.
Hareide, Lidun. 2019. “Comparable
parallel corpora: A critical review of current practices in corpus-based translation
studies”. In: Parallel Corpora for Contrastive and Translation Studies.
New resources and applications. Ed. by Doval, Irene and M. Teresa Sanchez Nieto. Benjamins, Amsterdam, pp. 19–38.
Hughes, James M., Nicholas J. Foti, David C. Krakauer and Daniel N. Rockmore. 2012. “Quantitative
patterns of stylistic influence in the evolution of literature”. Proceedings of the National
Academy of
Sciences 109(20). 7682–7686.
Jurafsky, D. and Martin, J. H. 2008. Speech
and Language Processing: An introduction to speech recognition, computational linguistics and natural language
processing. Upper Saddle River, NJ: Prentice Hall.
Kajzer-Wietrzny, Marta. 2012. “Interpreting
universals and interpreting style”. PhD thesis. Adam Mickiewicz University, Poznań, Poland.
Karakanta, Alina, Mihaela Vela and Elke Teich. 2018. “Preserving
Metadata from Parliamentary Debates”. In: Proceedings of the Eleventh
International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan: European Language Resources Association (ELRA).
Klingenstein, Sara, Tim Hitchcock and Simon De Deo. 2014. “The
civilizing process in London’s Old Bailey”. Proceedings of the National Academy of
Sciences 111(26). 9419–9424.
Koehn, Philipp. 2005. “Europarl:
a parallel corpus for statistical machine translation”. In: Proceedings
of the Tenth Machine Translation Summit. Phuket, Thailand: Asia-Pacific Association for Machine Translation, pp. 79–86.
Koppel, Moshe and Noam Ordan. 2011. “Translationese
and its dialects”, In: Proceedings of Conference of the Association for
Computational Linguistics (ACL), Portland, Oregon, pp. 1318–1326.
Lapshinova-Koltunski, Ekaterina and Marcos Zampieri. 2018. “Linguistic
features of genre and method variation in translation: a computational
perspective”. In: The Grammar of Genres and Styles: From Discrete to
Non-Discrete Units. Ed. by Dominique Legallois, Thierry Charnois, and Meri Larjavaara. Berlin, Boston: De Gruyter Mouton, pp. 92–117.
Monti, Cristina, Claudio Bendazzoli, Annalisa Sandrelli and Mariachiara Russo. 2005. “Studying
Directionality in Simultaneous Interpreting through an Electronic Corpus: EPIC (European Parliament Interpreting
Corpus. Meta, 50 (4).
Östling, Robert and Jörg Tiedemann. 2017. “Continuous
multilinguality with language vectors”. In: Proceedings of the 15th
Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short
Papers. Valencia, Spain: Association for Computational Linguistics, pp. 644–649.
Rabinovich, Ella and Shuly Wintner. 2015. “Unsupervised
Identification of Translationese”. In: Transactions of the Association
for Computational
Linguistics 3: 419–432.
Rubino, Raphael, Ekaterina Lapshinova-Koltunski and Josef van Genabith. 2016. “Information
Density and Quality Estimation Features as Translationese Indicators for Human Translation
Classification”. In: Proceedings of the 2016 Conference of the North
American Chapter of the Association for Computational Linguistics: Human Language Technologies
(NAACL). Association for Computational Linguistics. ACL.
Sandrelli, Annalisa and Claudio Bendazzoli. 2005. “Lexical
Patterns in Simultaneous Interpreting: A Preliminary Investigation of EPIC (European Parliament Interpreting
Corpus)”. In: Proceedings from the Corpus Linguistics Conference Series
1. Birmingham, UK: University of Birmingham.
Shlesinger, Miriam and Noam Ordan. 2012. “More
spoken or more translated? Exploring a known unknown of simultaneous
interpreting”. In: Target 24(1):43–60.
Szymor, Nina. 2018. “Translation:
universals or cognition? A usage-based
perspective”. In: Target 30(1):53–86.
Teich, Elke. 2003. Cross-linguistic
Variation in System and Text: A Methodology for the Investigation of Translations and Comparable
Texts. Mouton de Gruyter.
Teich, Elke, José Martínez Martínez and Alina Karakanta (2020). “Translation,
information theory and cognition”. In: Routledge Handbook of Translation
and Cognition. Ed. by Fabio Alves and Arnt Lykke Jakobson. London: Routledge, pp. 360–375.
Zou, Will Y., Richard Socher, Daniel Cer and Christopher D. Manning. 2013. “Bilingual
Word Embeddings for Phrase-Based Machine Translation”. In: Proceedings of
the 2013 Conference on Empirical Methods in Natural Language Processing. Seattle, Washington, USA: Association for Computational Linguistics, pp. 1393–1398.
Cited by (1)
Cited by one other publication
This list is based on CrossRef data as of 3 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
