Structure and Usage of the Tartu University Corpus of Written Estonian

Hennoste, T.; Koit, Mare; Roosmaa, T.; Saluveer, M.

doi:10.1075/ijcl.3.2.06hen

Article published In: International Journal of Corpus Linguistics
Vol. 3:2 (1998) ► pp.279–304

Get fulltext from our e-platform

Download PDF

Structure and Usage of the Tartu University Corpus of Written Estonian

T. Hennoste | University of Tartu, Estonia

Mare Koit | University of Tartu, Estonia

T. Roosmaa | University of Tartu, Estonia

M. Saluveer | University of Tartu, Estonia

Published online: 1 January 1998

https://doi.org/10.1075/ijcl.3.2.06hen

This paper provides an overview of the first computer corpus of the Estonian language compiled at the University of Tartu. It was based on the design principles of the LOB and Brown corpora. The main part of the corpus was assembled from 1991-1995 and contains about 1 million textual words. It was compiled by an interdepartmental computational linguistics research group of the university. This paper gives a survey of the text groups in the corpus and of the problems the compilers had to solve together with the proposed solutions and outlines the main differences from the model corpora and the underlying reasons for them. These are followed by a review of the available computer routines for processing the corpus.

Keywords: Using a Corpus, Tagging, Designing, Corpora

Cited by (2)

Cited by two other publications

Conrad, Susan M. & Kimberly R. LeVelle

2008. Corpus Linguistics and Second Language Instruction. In The Handbook of Educational Linguistics, ► pp. 539 ff.

SARDINHA, Tony Berber

2000. Lingüística de Corpus: histórico e problemática. DELTA: Documentação de Estudos em Lingüística Teórica e Aplicada 16:2 ► pp. 323 ff.

This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.