In:Observing Eurolects: Corpus analysis of linguistic variation in EU law
Edited by Laura Mori
[Studies in Corpus Linguistics 86] 2018
► pp. 27–45
Chapter 2The Eurolect Observatory Multilingual Corpus
Construction and query tools
Published online: 6 December 2018
https://doi.org/10.1075/scl.86.02tom
https://doi.org/10.1075/scl.86.02tom
Abstract
This chapter aims to explain the corpus design of the Eurolect Observatory Multilingual Corpus and the steps required to build all the different monolingual corpora the project needed to accomplish its research objectives. The first two paragraphs after the general introduction will point out the differences and the overlaps that characterize all the corpora that the author of this paper was in charge of producing as a member of the UNINT research team and that were used in the Eurolect Observatory Project for text mining. After accurately defining the data collection and corpus building strategies adopted, this paper will describe the corpus search tool that was developed in order to help scholars look for and save samples of text from the whole corpus in a convenient and easy way.
Article outline
- 1.Introduction
- 2.Corpus collection
- 2.1Corpus A
- 2.2Corpus B
- 3.Corpus search tools
- 3.1Overview of the SearchIt tools
- 3.2Main functions of the SearchIt tools
Notes References
References (10)
Barbera, E., Corino, E., & Onesti, C. (2007). Cosa è un corpus? Per una definizione più rigorosa di corpus, token, markup. In E. Barbera, E. Corino, & C. Onesti (Eds.), Corpora e linguistica in rete (pp. 25–88). Perugia: Guerra Edizioni.
Burnage, G., & Dunlop, D. (1992). Encoding the British National Corpus. In J. Aarts, P. de Haan, & N. Oostdijk (Eds.), English language corpora: Design, analysis and exploitation. Papers from the Thirteenth International Conference on English Language Research on Computerized Corpora, Nijmegen 1992 (pp. 79–95). Amsterdam: Rodopi.
Gillam, R. (2003). Unicode demystified: A practical programmer’s guide to the encoding standard. Boston MA: Addison-Wesley.
Kenning, M. -M. (2010). What are parallel and comparable corpora and how can we use them? In A. O’Keeffe & M. McCarthy (Eds.), The Routledge handbook of corpus linguistics (pp. 487–500). London: Routledge.
Lenci, A., Montemagni, S., & Pirrelli, V. (2016). Testo e computer. Elementi di linguistica computazionale. Roma: Carocci.
Mori, L. (2018). Introduction The Eurolect Observatory Project. In L. Mori (Ed.), Observing Eurolects. Corpus analysis of linguistic variation in EU law (Studies in Corpus Linguistics 86). Amsterdam: John Benjamins. (this volume).
Reppen, R. (2010). Building a corpus: What are the key considerations? In A. O’Keeffe & M. McCarthy (Eds.), The Routledge handbook of corpus linguistics (pp. 31–37). London: Routledge.
Robbins, A. (2015). Effective AWK programming: Universal text processing and pattern matching. Sebastopol, CA: O’Reilly Media.
Cited by (5)
Cited by five other publications
Mori, Laura & Benedikt Szmrecsanyi
Mori, Laura
Portelli, Sergio & Sandro Caruana
2018. Observing Eurolects. In Observing Eurolects [Studies in Corpus Linguistics, 86], ► pp. 267 ff.
Sandrelli, Annalisa
Sosoni, Vilelmini, Katia Lida Kermanidis & Sotirios Livas
2018. Observing Eurolects. In Observing Eurolects [Studies in Corpus Linguistics, 86], ► pp. 169 ff.
This list is based on CrossRef data as of 1 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
