Describing lexical patterns in simultaneously interpreted discourse in a parallel aligned corpus of Russian-English interpreting (SIREN)
Published online: 26 November 2018
https://doi.org/10.1075/forum.17004.day
https://doi.org/10.1075/forum.17004.day
Abstract
The paper introduces a corpus of simultaneous interpretation, SIREN. SIREN is a parallel aligned bidirectional corpus of original
and simultaneously interpreted speech in Russian and English. At the moment the corpus contains 235,040 words and is enriched with
POS and shallow syntactic annotation.
After outlining the corpus design, I used scores for lexical variety, density and POS proportionalities to make tentative claims
about the linguistic variation between originals and interpretations. Low lexical variety and density are taken as indicators of
simplification, while a higher ratio of nominal to pronominal reference is seen as an indicator of explicitation. Atypical
wordclass distribution indicates the source language shining through. Somewhat contradictory results, with the Russian subcorpus
conforming to the predictions of translation theory and the English subcorpus exhibiting the opposite trend in all universals but
still shining through, invites further investigation of the data and once again puts into question unequivocal claims about
T-universals.
Résumé
L’article présente SIREN, un corpus d’interprétation simultanée. SIREN est un corpus bidirectionnel aligné en
parallèle et constitué du discours source et de son interprétation simultanée en russe et en anglais. Actuellement, le corpus
contient 235 040 mots et est enrichi avec POS et des annotations syntaxiques superficielles.
Après une présentation de la conception du corpus, j’utilise des scores tels que la variété lexicale, la densité et les
proportionnalités POS pour émettre des hypothèses préliminaires sur la variation linguistique entre les discours sources et les
interprétations. La faible variété et la densité lexicales sont considérées comme des indicateurs de simplification, tandis
qu’un rapport plus élevé entre la référence nominale et la référence pronominale est considéré comme un indicateur
d’explicitation. La distribution atypique de classe de mots indique une démarcation dans la langue source. Des résultats
quelque peu contradictoires, avec le sous-corpus russe conforme aux prédictions de la théorie de la traduction et le sous-corpus
anglais montrant une tendance opposée dans tous les universaux tout en se démarquant, demandent des analyses supplémentaires des
données et, une fois de plus, remettent en question les hypothèses irréfutables concernant les universaux de la traduction.
Article outline
- 1.Introduction: Corpus-based interpreting studies
- 1.1First steps. Planning the corpus and obtaining the data
- 1.2Data sources and television interpreting
- 1.3Transcription and annotation
- 2.Preliminary findings from SIREN: The description of lexical complexity
- 3.Discussion and conclusion
- Acknowledgements
- Notes
Bibliography
References (83)
Alexieva, Bistra. 1997. “Interpreting Mediated TV Events.” In Proceedings of the 2nd International Conference on Current Trends in Studies of Translation and Interpreting, edited by Kinga Klaudy, János Kohn, and Mary Snell-Hornby, 171–74. Budapest: Scholastica.
Baker, Mona. 1995. “Corpora in Translation Studies: An Overview and Some Suggestions for Future Research.” Target 7 (2): 223–43. .
. 1996. “Corpus-Based Translation Studies.” In Terminology, LSP and Translation: Studies in Language Engineering in Honour of Juan C. Sager, edited by Harold Somers, 175-. Amsterdam: Benjamins.
Bendazzoli, Claudio, and Annalisa Sandrelli. 2005. “An Approach to Corpus-Based Interpreting Studies: Developing EPIC (European Parliament Interpreting Corpus).” In MuTra 2005 – Challenges of Multidimensional Translation, edited by Heidrun Gerzymisch-Arbogast and Sandra Nauert..
. 2009. “Corpus-Based Interpreting Studies: Early Work and Future Prospects.” Revista Tradumatica 71: 1–9.
Bendazzoli, Claudio, Annalisa Sandrelli, and Mariachiara Russo. 2011. “Disfluencies in Simultaneous Interpreting: A Corpus-Based Analysis.” In Corpus-Based Translation Studies: Research and Applications, edited by Alet Kruger, Kim Wallmach, and Jeremy Munday, 282–306. London: Continuum.
Bendazzoli, Claudio. 2012. From international conferences to machine-readable corpora and back: an ethnographic approach to simultaneous interpreter-mediated communicative events. In Breaking Ground in Corpus-based Interpreting Studies, edited by Francesco Straniero Sergio and Caterina Falbo, 91-118. Bern: Peter Lang.
Bernardini, Silvia, Adriano Ferraresi, and Maja Miličević. 2016. “From EPIC to EPTIC – Exploring Simplification in Interpreting and Translation from an Intermodal Perspective.” Target 28 (1): 61–86. .
Blum-Kulka, Shoshana, and Eddie Levenston. 1983. “Universals of Lexical Simplification.” In Strategies in Interlanguage Communication, edited by Claus Faerch and Gabriele Kasper. Vol. 281. London: Longman. .
Bros-Brann, Eliane. 1993. “Simultaneous Interpretation and the Media: Interpreting Live for Television.” In Translation – the Vital Link. La Traduction Au Coeur de La Communication, 267–71. London: Institute of Translation and Interpreting.
Chesterman, Andrew. 2004. “Hypotheses about Translation Universals.” In Claims, Changes and Challenges in Translation Studies, edited by Gyde Hansen, Kirsten Malmkjaer, and Daniel Gile, 1–14. Amsterdam: Benjamins.
Chernov, Ghelly. 1987/2004. Inference and Anticipation in Simultaneous Interpreting. Amsterdam: Benjamins.
Church, Ian. 2009. Official Report, (Hansard), House of Commons, Centenary Volume 1909–2009: An Anthology of Historic and Memorable House of Commons Spreeches to Celebrate the First 100 Years. London: House of Commons.
Daly, Albert F. 1985. “Interpreting for International Satellite Television.” Meta: Translators’ Journal 30 (1): 203–9.
Diwersy, Sascha, Stefan Evert, and Stella Neumann. 2014. “A Weakly Supervised Multivariate Approach to the Study of Language Variation.” In Aggregating Dialectology, Typology, and Register Analysis, edited by Benedikt Szmrecsanyi and Bernhard Wälchli, 174–204. Berlin: De Gruyter.
Evert, Stefan, and Stella Neumann. 2017. “The Impact of Translation Direction on the Characteristics of Translated Texts: A Multivariate Analysis for English and German.” In Empirical Translation Studies. New Methodological and Theoretical Traditions, edited by Gert De Sutter, Marie-Aude Lefer, and Isabelle Delaere. Berlin: De Gruyter.
Falbo, Caterina. 2012. “CorIT (Italian Television Interpreting Corpus): Classification Criteria.” In Breaking Ground in Corpus-Based Interpreting Studies, edited by Francesco Straniero Sergio and Caterina Falbo, 155–86. Bern: Peter Lang.
Gorokhova, Anna. 2007. “Sopostavlenie Prjamogo I Kosvennogo Sposobov Perevoda v Sinkhronnom I Pismennom Perevodakh.” Vestnik YGU 4 (2): 85–90.
Gravier, Guillaume, Gilles Adda, Niklas Paulson, Matthieu Carré, Aude Giraudel, and Olivier Galibert. 2012. “The ETAPE Corpus for the Evaluation of Speech-Based TV Content Processing in the French Language.” In Eighth International Conference on Language Resources and Evaluation (LREC), 114–18. Istanbul. [URL].
Gurin, Ilya Viktorovich. 2009. “Priemy Rechevoj Kompressii Pri Sinkhronnom Perevode S Russkogo Jazika Na Anglijskij.” Moscow State University.
Hansen-Schirra, Silvia, Stella Neumann, and Erich Steiner. 2007. “Cohesive Explicitness and Explicitation in an English-German Translation Corpus.” Languages in Contrast 7 (2): 241–65.
Hansen-Schirra, Silvia, and Elke Teich. 2009. “Corpora in Human Translation.” In Corpus Linguistics. An International Handbook. Vol. 21, edited by Anke Lüdeling and Merja Kyto, 1159–75. Berlin: Mouton de Gruyter. .
Hansen, Silvia. 2003. “The Nature of Translated Text. An Interdisciplinary Methodology for the Investigation of the Specific Properties of Translation.” Saarbrücken.
House, Juliane, Bernd Meyer, and Thomas Schmidt. 2012. “CoSi – A Corpus of Consecutive and Simultaneous Interpreting.” In Multilingual Corpora and Multilingual Corpus Analysis, edited by Thomas Schmidt and Kai Wörner, 295–304. Amsterdam: Benjamins. .
Hundt, Marianne, Martin Volk, Elena Callegaro, and Johannes Graën. 2016 “SPARCLING: Large-Scale Annotation and Alignment of Parallel Corpora for the Investigation of Linguistic Variation.” [URL].
Kalina, Sylvia. 1998. Strategische Prozesse Beim Dolmetschen: Theoretische Grundlagen, Empirische Fallstudien, Didaktische Konsequenzen. Tübingen: Günter Narr.
Kaufmann, Francine. 1995. “Formation À La Traduction et À L’interprétation Pour Les Médias Audiovisuels.” In (Multi) Media Translation. Concepts, Practices, and Research, edited by Yves Gambier and Henrik Gottlieb, 431–42. Amsterdam: Benjamins.
Koehn, Philipp. 2005. “Europarl: A Parallel Corpus for Statistical Machine Translation.” In MT Summit, 111:79–86. .
Kurz, Ingrid. 1993. “The 1992 U.S. Presidential Elections: Interpreting the American Debathon for Austrian Television.” In Translation – the Vital Link. La Traduction Au Coeur de La Communication, edited by Catriona Picken, 11:441–45. London: Institute of Translation and Interpreting.
. 1996. “Special Features of Media Interpreting as Seen by Interpreters and Users.” In New Horizons. Horizons Nouveaux. Proceedings of the XIVth FIT World Congress, 957–65. Melbourne: AUSIT.
Kurz, Ingrid, and Franz Pöchhacker. 1995. “Quality in TV Interpreting.” In (Multi) Media Translation. Concepts, Practices, and Research, edited by Yves Gambier and Henrik Gottlieb, 350–58. Amsterdam: Benjamins.
Lapshinova-Koltunski, Ekaterina. 2015. “Variation in Translation: Evidence from Corpora.” In New Directions in Corpus-Based Translation Studies, edited by Claudio Fantinuoli and Federico Zanettin, 79–99. Language Science Press.
Laviosa-Braithwaite, Sara. 1996. “The English Comparable Corpus (ECC) : A Resource and a Methodology for the Empirical Study of Translation.” University of Manchester.
Laviosa, Sara. 1997. “Investigating Simplification in an English Comparable Corpus of News Articles.” In Transference Necesse Est: Proceedings of the Second International Conference on Current Trends in Studies of Translation and Interpreting, 5–7 September 1996, Budapest, Hungary, edited by Kinga Klaudy and Janos Kohn, 531–40. Budapest: Scholastica.
. 1998. “Core Patterns of Lexical Use in a Comparable Corpus of English Narrative Prose.” Meta: Translators’ Journal 43 (4): 557–70. .
Lederer, Marianne. 1981. La Traduction Simultanée. Expérience et Théorie. Paris: Minard, Lettres Modernes.
Leech, Geoffrey, Paul Rayson, and Andrew Wilson. 2001. Companion Website for: Word Frequencies in Written and Spoken English. London: Longman. [URL].
Lyashevskaya, O. N., and S. A. Sharoff. 2009. Novyj Chastotnyj Slovar’ Russkoj Leksiki. Moscow: Azbukovnik. [URL].
Mack, Gabriele. 2001. “Conference Interpreters on the Air: Live Simultaneous Interpreting on Italian Television.” In (Multi) Media Translation. Concepts, Practices, and Research, edited by Yves Gambier and Henrik Gottlieb, 125–32. Amsterdam: Benjamins.
Meyer, Bernd. 2008. “Interpreting Proper Names: Different Interventions in Simultaneous and Consecutive Interpreting?” Trans-Kom 1 (1): 105–22.
. 2010. Consecutive and Simultaneous Interpreting (CoSi). Archived in Hamburger Zentrum für Sprachkorpora. Version 1.1. Publication date 2010-02-26. [URL].
Mizuno, Akira. 1997. “Broadcast Interpreting in Japan. Some Theoretical and Practical Aspects.” In Conference Interpreting: Current Trends in Research. Proceedings of the International Conference on Interpreting: What Do We Know and How?, edited by Yves Gambier, Daniel Gile, and Christopher Taylor, 192–94. Amsterdam: Benjamins.
Monti, Cristina, Claudio Bendazzoli, Annalisa Sandrelli, and Mariachiara Russo. 2005. “Studying Directionality in Simultaneous Interpreting through an Electronic Corpus: EPIC (European Parliament Interpreting Corpus).” Meta: Translators’ Journal 50 (4). .
Müürisep, Kaili, and Helen Nigol. 2000. “Disfluency Detection and Parsing of Transcribed Speech of Estonian.” In Human Language Technology. Challenges of the Information Society, edited by Zygmunt Vetulani and Hans Uszkoreit, 165–77. Heidelberg: Springer.
Niemants, Natacha S. A. 2012. “The Transcription of Interpreting Data” 14266471: 165–91. .
Ostendorf, M., B. Favre, D. Grishman, M. Hakkani-Tur, M. Harper, D. Hillard, J. Hirschberg, et al. 2008. “Speech Segmentation and Its Impact on Spoken Document Processing.” Signal Processing Magazine 25 (3): 59–69.
Pariy, Alexey, and Maria Kunilovskaya. 2016. “Learner vs. Professional Translations into Russian: Lexical Profiles.” In Dialogue-21. Saint-Petersburg. [URL].
Piemontese, M. E. 1995. “Ll LIP: Uno Strumento per La Didattica Della Lingua Italiana in Ltalia E All’estero.” Italica 72 (4): 474–87.
Pym, Anthony. 2007. “On Shlesinger ’s Proposed Equalizing Universal for Interpreting.” In Interpreting Studies and beyond: A Tribute to Miriam Shlesinger, edited by Franz Pöchhacker, A. L. Jakobsen, and I. M. Mees, 175–90. Copenhagen: Samfundslitteratur Press.
Russo, Mariachiara, Claudio Bendazzoli, and Annalisa Sandrelli. 2006. “Looking for Lexical Patterns in a Trilingual Corpus of Source and Interpreted Speeches: Extended Analysis of EPIC (European Parliament Interpreting Corpus).” Forum 4 (1): 221–254.
Russo, Mariachiara, Claudio Bendazzoli, Annalisa Sandrelli, and Nicoletta Spinolo. 2012. “The European Parliament Interpreting Corpus (EPIC): Implementation and Developments.” In Breaking Ground in Corpus-Based Interpreting Studies, edited by Francesco Straniero Sergio and Caterina Falbo, 53–90. Bern: Peter Lang.
Sandrelli, Annalisa, and Claudio Bendazzoli. 2005. “Lexical Patterns in Simultaneous Interpreting: A Preliminary Investigation of EPIC (European Parliament Interpreting Corpu.” In Proceedings from the Corpus Linguistics Conference Series.
. 2006. “Tagging a Corpus of Interpreted Speeches: The European Parliament Interpreting Corpus (EPIC).” In 5th International Conference on Language Resources and Evaluation, 647–52. [URL].
Sandrelli, Annalisa, Claudio Bendazzoli, and Mariachiara Russo. 2010. “European Parliament Interpreting Corpus (EPIC): Methodological Issues and Preliminary Results on Lexical Patterns in Simultaneous Interpreting.” International Journal of Translation Studies 22 (1–2): 167–206.
Schmid, Helmut. 1995. “Improvements In Part-of-Speech Tagging With an Application To German.” In Proceedings of the ACL SIGDAT-Workshop, 47–50.
Seleskovitch, Danica. 1978. Interpreting for international conferences. Washington, DC: Pen and Booth.
Sandrelli, Annalisa. 2012. Introducing FOOTIE (Football in Europe): simultaneous interpreting in football press conferences. In Breaking Ground in Corpus-based Interpreting Studies, edited by Francesco Straniero Sergio and Caterina Falbo. 119-54. Bern: Peter Lang.
Straniero Sergio, Francesco, and Caterina Falbo. 2016. “Studying Interpreting through Corpora. An Introduction.” In Breaking Ground in Corpus-Based Interpreting Studies, edited by Francesco Straniero Sergio and Caterina Falbo, 9–52. Bern: Peter Lang.
Setton, Robin. 1999. Simultaneous Interpretation: A Cognitive-Pragmatic Analysis. Amsterdam: Benjamins.
Setton, Robin, and Adelina Hild. 2004. “Editors’ critical foreword.” In Inference and Anticipation in Simultaneous Interpreting, Ghelly Chernov, ix–xxii. Amsterdam: Benjamins.
Setton, Robin. 2011. “Corpus-Based Interpreting Studies (CIS): Overview and Prospects.” In Corpus-Based Translation Studies, edited by Alet Kruger, Kim Wallmach, and Jeremy Munday. London: Continuum.
Sharoff, Serge, Mikhail Kopotev, Tomaz Erjavec, Anna Feldman, and Dagmar Divjak. 2008. “Designing and Evaluating a Russian Tagset.” In Proceedings of the Sixth International Language Resources and Evaluation (LREC’08), 279–85. [URL].
Shlesinger, Miriam. 1989. “Simultaneous Interpretation as a Factor in Effecting Shifts in the Position of Texts on the Oral-Literate Continuum.” Tel Aviv University.
. 1998. “Corpus-Based Interpreting Studies as an Offshoot of Corpus-Based Translation Studies.” Meta: Translators’ Journal 43 (4): 486–93. .
. 2008. “Towards a Definition of Interpretese: An Intermodal, Corpus-Based Study.” In Efforts and Models in Interpreting and Translation Research: A Tribute to Daniel Gile, edited by Gyde Hansen, Andrew Chesterman, and Heidrun Gerzymisch-Arbogast, 237–53. Amsterdam: Benjamins.
Steiner, Erich. 2012. “A Characterization of the Resource Based on Shallow Statistics.” In Cross-Linguistic Corpora for the Study of Translations, edited by Silvia Hansen-Schirra, Stella Neumann, and Erich Steiner, 71–90. Berlin: De Gruyter.
Straniero Sergio, Francesco. 1999. “The Interpreter on the (Talk) Show: Interaction and Participation Frameworks.” Translator 5 (2): 303–26.
. 2013. “Media Interpreting.” In The Encyclopedia of Applied Linguistics, edited by Carol A. Chapelle, 1–5. London: Blackwell. .
Straniero Sergio, Francesco, and Caterina Falbo. 2012a. Breaking Ground in Corpus-Based Interpreting Studies. System. Vol. 441. Berlin: Peter Lang. .
Stubbs, Michael. 1986. “Lexical Density: A Technique and Some Findings.” In Talking about Text, edited by Malcolm Coulthard, 27–42. Birmingham: University of Birmingham.
Thompson, Paul. 2005. “Spoken Language Corpora.” In Developing Linguistic Cor- Pora: A Guide to Good Practice, edited by Martin Wynne, 59–70. Oxford: Oxbow books.
Tohyama, Hitomi, Koichiro Ryu, Shigeki Matsubara, Nobuo Kawaguchi, and Yasuyoshi Inagaki. 2004. “CIAIR Simultaneous Interpretation Corpus.” In Proceedings of Oriental-COCOSDA. [URL].
Vanderauwera, Ria. 1985. Dutch Novels Translated into English: The Transformation of a Minority’’ Literature. Amsterdam: Rodopi.
Vandevoorde, Lore, and Gert De Sutter. 2015. “On Semantic Differences between Translated and Non-Translated Dutch.” In Empirical Translation Studies. Interdisciplinary Methodologies Explored, edited by Ji Meng, 128–46. Sheffield: Equinox Publishing.
Volk, Martin, Johannes Graen, and Elena Callegaro. 2014. “Innovations in Parallel Corpus Search Tools.” In Ninth International Conference on Language Resources and Evaluation (LREC’14), 3172–78. Reykjavik. [URL].
Cited by (17)
Cited by 17 other publications
Li, Ruitian, Kanglong Liu & Andrew K.F. Cheung
Xu, Han, Jinghang Gu, Kanglong Liu & Qinyi Li
Chmiel, Agnieszka, Marta Kajzer-Wietrzny, Danijel Koržinek, Dariusz Jakubowski & Przemysław Janikowski
2024. Syntax, stress and cognitive load, or on syntactic processing in simultaneous interpreting. Translation, Cognition & Behavior 7:1 ► pp. 22 ff.
Grabowski, Łukasz & Daria Dayter
Xu, Han & Kanglong Liu
Yao, Yao, Dechao Li, Yingqi Huang & Zhonggang Sang
Liu, Yi, Andrew K.F. Cheung & Kanglong Liu
Liu, Zhibo & Juhua Dou
Xu, Cui & Dechao Li
2022. Exploring genre variation and simplification in interpreted language from comparable and intermodal
perspectives. Babel. Revue internationale de la traduction / International Journal of Translation 68:5 ► pp. 742 ff.
Xu, Cui & Dechao Li
2024. More spoken or more translated?. Target. International Journal of Translation Studies 36:3 ► pp. 445 ff.
Dayter, Daria
Dayter, Daria
Dayter, Daria
Locher, Miriam A. & Maria Sidiropoulou
Russo, Mariachiara
2021. Corpus-based interpreting studies. In Handbook of Translation Studies [Handbook of Translation Studies, 5], ► pp. 31 ff.
This list is based on CrossRef data as of 8 november 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
