In:Parallel Corpora for Contrastive and Translation Studies: New resources and applications
Edited by Irene Doval and M. Teresa Sánchez Nieto
[Studies in Corpus Linguistics 90] 2019
► pp. 233–247
An overview of Basque corpora and the extraction of certain multi-word expressions from a translational corpus
Published online: 20 March 2019
https://doi.org/10.1075/scl.90.14san
https://doi.org/10.1075/scl.90.14san
Since the 1980s, considerable efforts have been made to create different types of Basque corpora. However, to systematically analyse the Basque translations of German literary texts, it was necessary to create a corpus from the ground up. Intermediary versions were included in this corpus whenever the Basque target text was not a translation from the German original but came instead from a translation into another language (Spanish in most cases). A tool called TAligner was used to align the bitexts and the tritexts. The aim of this chapter is, firstly, to provide the reader with an overview of the main Basque corpora. Secondly, I will describe the design and compilation process of a parallel and multilingual corpus using TAligner 3.0. Thirdly, I will present how the corpus has been lemmatized and annotated at the level of part-of-speech. Finally, the process of extracting potential Basque multi-word expressions will be shown.
Keywords: Basque corpora, Aleuska corpus, TAligner, Basque MWEs
Article outline
- 1.Introduction
- 2.An overview of Basque corpora
- 3.Design, compilation and annotation of the Aleuska corpus
- 4.Extraction of MWEs
- 5.Conclusion
Notes References
References (12)
Altzibar, Xabier & Bilbao, Xabier & Garai, Koldo. 2011. Collocations in Basque: A test for classification. In Proceedings of the 5th International Conference on Meaning-Text Theory, Barcelona, September 8–9, 1–12.
Agerri, Rodrigo & Bermudez, Josu & Rigau, German. 2014. IXA pipeline: efficient and ready to use multilingual NLP Tools. In Proceedings of the 9th Language Resources and Evaluation Conference (LREC2014), Reykjavik, May 26–31.
Areta, Nerea & Gurrutxaga, Anton & Leturia, Igor. 2008. Begiratu bat corpus-baliabideei. BAT Soziolinguistika aldizkaria 62: 71–92.
Corpas Pastor, Gloria. 2008. Investigar con corpus en traducción: los retos de un nuevo paradigma. Frankfurt: Peter Lang.
Hulden, Mans. 2009. Foma: A finite-state toolkit and library. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, 29–32.
Ibarretxe Antuñano, Iraide & Martinez Lizarduikoa, Alfontso. 2006. Hizkuntzaren bihotzean: Euskal onomatopeien hiztegia. Donostia-San Sebastian: Gaiak.
Kenny, Dorothy. 2001. Lexis and Creativity in Translation. A corpus-based approach. Manchester: St. Jerome.
Serón Ordóñez, Inmaculada. 2015. Cómo crear y analizar corpus paralelos. Un procedimiento con software accesible y económico y algunas sugerencias para software futuro. In Corpus-based Translation and Interpreting Studies:
From description to application
, María Teresa Sánchez Nieto (ed). Berlin: Frank & Timme. 167–190.
Sinclair, John. 2005. Corpus and text-basic principles. In Developing Linguistic Corpora: A Guide to Good Practice, Martin Wyne (ed). Oxford: University of Oxford–AHDS Literature, Languages and Linguistics. <[URL]> (6 May 2017).
Urkia, Miriam. 2010. Corpusgintzaren garrantzia hizkuntzalaritzan eta euskararen egoeran. <[URL]> (6 May 2017).
Cited by (2)
Cited by two other publications
Pérez Blanco, María & Marlén Izquierdo
2021. Developing a corpus-informed tool for Spanish professionals writing specialised texts in English. In Corpora in Translation and Contrastive Research in the Digital Age [Benjamins Translation Library, 158], ► pp. 147 ff.
Sanz-Villar, Zuriñe & Olaia Andaluz-Pinedo
2021. TAligner 3.0. In Corpora in Translation and Contrastive Research in the Digital Age [Benjamins Translation Library, 158], ► pp. 125 ff.
This list is based on CrossRef data as of 1 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
