Automatic Extraction of Terminological Translation Lexicon from Czech-English Parallel Texts

Cmejrek, Martin; Curín, Jan

doi:10.1075/ijcl.6.si.02cme

Article published In: Text Corpora and Multilingual Lexicography
Wolfgang Teubert
[International Journal of Corpus Linguistics 6:SI] 2001
► pp. 1–12

Get fulltext from our e-platform

Download PDF

Automatic Extraction of Terminological Translation Lexicon from Czech-English Parallel Texts

Martin Cmejrek

Jan Curín

Published online: 17 December 2001

https://doi.org/10.1075/ijcl.6.si.02cme

We present experimental results of an automatic extraction of a Czech-English translation dictionary. Two different bilingual corpora (119,886 sentence pairs computer-oriented and 58,137 journalistic corpora) were created. We used the length-based statistical method for sentence alignment (Gale and Church 1991) and noun phrase marker working with regular grammar and probabilistic model (Brown et al. 1993) for dictionary extraction. Resulting dictionaries’ size varies around 6,000 entries. After significance filtering, weighted precision is 86.4% for computer-oriented and 70.7% for journalistic Czech-English dictionary.

Cited by (1)

Cited by one other publication

Mikhailov, Mikhail

2021. Mind the Source Data! Translation Equivalents and Translation Stimuli from Parallel Corpora. In New Perspectives on Corpus Translation Studies [New Frontiers in Translation Studies, ], ► pp. 259 ff.

This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.