In:Parallel Corpora for Contrastive and Translation Studies: New resources and applications
Edited by Irene Doval and M. Teresa Sánchez Nieto
[Studies in Corpus Linguistics 90] 2019
► pp. 123–139
Building EPTIC
A many-sided, multi-purpose corpus of EU parliament proceedings
Published online: 20 March 2019
https://doi.org/10.1075/scl.90.08fer
https://doi.org/10.1075/scl.90.08fer
This chapter describes the steps involved in the construction of EPTIC, an intermodal corpus of European Parliament speeches. Despite its limited size, this corpus has features that justify its labour-intensive building process, in particular its multiple alignments. The text-to-text alignments allow users to compare interpretations and translations of source speeches and their written-up reports, while text-to-video alignments allow them to access the multimedia components from concordance lines. To illustrate the potential of EPTIC, a case study is presented of English loan words in original, translated and interpreted Italian and French. Results suggest that borrowing is more likely to occur in translated Italian than in any of the other corpus components.
Article outline
- 1.Introduction: Why another corpus of European Parliament speeches?
- 2.What EPTIC looks like
- 2.1One corpus, fourteen subcorpora
- 2.2Practical details: Size and availability
- 3.Building EPTIC
- 3.1Selecting and obtaining raw corpus materials
- 3.2Transcribing the oral data
- 3.3Adding metadata
- 3.4Performing text-to-text alignment
- 3.5Performing text-to-video alignment
- 3.6POS-tagging, lemmatization and indexing
- 4.An example: English loan words in Italian and French
- 5.Conclusion: Teaming up
Acknowledgement Notes References
References (21)
Baker, Mona. 1995. Corpora in translation studies: An overview and some suggestions for future research. Target 7(2): 223–243.
Bernardini, Silvia, Collard, Camille, Ferraresi, Adriano, Russo Mariachiara & Defrancq, Bart. 2018. Building interpreting and intermodal corpora: A how-to for a formidable task. In Making Way in Corpus-based Interpreting Studies, Mariachiara Russo, Claudio Bendazzoli & Bart Defrancq (eds), 21–42. Singapore: Springer.
Bogaards, Paul. 2008. On ne parle pas franglais: La langue française face à l'anglais. Brussels: De Boeck/Duculot.
Burnard, Lou. 2004. Metadata for corpus work. In Developing Linguistic Corpora: A Guide to Good Practice, Martin Wynne (ed.). <[URL]> (30 June 2017).
Chesterman, Andrew. 2004. Hypotheses about translation universals. In Claims, Changes and Challenges in Translation Studies [Benjamins Translation Library 50], Gyde Hansen, Kirsten Malmkjaer & Daniel Gile (eds), 1–13. Amsterdam: John Benjamins.
Codrea-Rado, Anna. 2014. European parliament has 24 official languages, but MEPs prefer English. The Guardian. <[URL]> (30 October 2017).
Evert, Stefan & the CWB Development Team. 2016. The IMS Open Corpus Workbench (CWB) Corpus Encoding Tutorial. CWB Version 3.4: <[URL]> (30 October 2017).
Frankenberg-Garcia, Ana & Santos, Diana. 2003. Introducing COMPARA: The Portuguese–English parallel corpus. In Corpora in Translator Educatio, Federico Zanettin, Silvia Bernardini & Dominic Stewart (eds), 71–87. Manchester: St. Jerome.
Granger, Sylviane. 2010. Comparable and translation corpora in cross-linguistic research. Design, analysis and applications. Journal of Shanghai Jiaotong University 2: 14–21.
Johansson, Stig. 1998. On the role of corpora in cross-linguistic research. In Corpora and Cross-linguistic Research, Stig Johansson & Signe Oksefjell (eds), 3–24. Amsterdam: Rodopi.
Koehn, Philipp. 2005. Europarl: A parallel corpus for statistical machine translation. In Machine Translation Summit X, 79–86. Phuket, Thailand.
Motschenbacher, Heiko. 2013. New Perspectives on English as a European Lingua Franca. Amsterdam: John Benjamins.
Niemants, Natacha. 2015. Transcription. In The Routledge Encylopedia of Intepreting Studies, Franz Pöchhacker (ed), 421–422. London: Routledge.
Nisioi, Sergiu, Rabinovich, Ella, Dinu, Liviu P. & Wintner, Shuly. 2016. A corpus of native, non-native and translated texts. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), 4197–4201.
Pietrandrea, Paola, Kahane, Sylvain, Lacheret-Dujour, Anne & Sabio, Frédéric. 2014. The notion of sentence and other discourse units in corpus annotation. In Spoken Corpora and Linguistic Studies [Studies in Corpus Linuistics 61], Tommaso Raso & Heliana Mello (eds), 331–364. Amsterdam: John Benjamins.
Rychlý, Pavel. 2007. Manatee/Bonito – A modular corpus manager. In 1st Workshop on Recent Advances in Slavonic Natural Language Processing, 65–70. Masaryk University, Brno.
Shlesinger, Miriam. 2009. Towards a definition of interpretese: An intermodal, corpus-based study. In Efforts and Models in Interpreting and Translation Research: A Tribute to Daniel Gile [Benjamins Translation Library 80], Gyde Hansen, Andrew Chesterman & Heidrun Gerzymisch-Arbogast (eds), 237–253. Amsterdam: John Benjamins.
Toury, Gideon. 1995. Descriptive Translation Studies – and Beyond [Benjamins Translation Library 4]. Amsterdam: John Benjamins.
Varga, Dániel, Németh, László, Halácsy, Péter, Kornai, András, Viktor Trón & Nagy, Viktor. 2005. Parallel corpora for medium density languages. In Proceedings of the RANLP 2005, 590–596.
Cited by (4)
Cited by four other publications
Pérez-Paredes, Pascual & Carlos Ordoñana-Guillamón
Kajzer-Wietrzny, Marta
2022. An intermodal approach to cohesion in constrained and unconstrained language. Target. International Journal of Translation Studies 34:1 ► pp. 130 ff.
Bendazzoli, Claudio, Michela Bertozzi & Mariachiara Russo
This list is based on CrossRef data as of 1 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
