Learning Lessons from Bilingual Corpora: Benefits for Machine Translation

Streiter, Oliver; Iomdin, Leonid L.

doi:10.1075/ijcl.5.2.06str

Article published In: International Journal of Corpus Linguistics
Vol. 5:2 (2000) ► pp.199–230

Get fulltext from our e-platform

Download PDF

Learning Lessons from Bilingual Corpora: Benefits for Machine Translation

Oliver Streiter | Academia Sinica, Institute of Information Science

Leonid L. Iomdin | Institute for Information Transmission Problems, Russian Academy of Sciences

Published online: 30 May 2001

https://doi.org/10.1075/ijcl.5.2.06str

The research described in this paper is rooted in the endeavors to combine the advantages of corpus-based and rule-based MT approaches in order to improve the performance of MT systems—most importantly, the quality of translation. The authors review the ongoing activities in the field and present a case study, which shows how translation knowledge can be drawn from parallel corpora and compiled into the lexicon of a rule-based MT system. These data are obtained with the help of three procedures: (1) identification of hence unknown one-word translations, (2) statistical rating of the known one-word translations, and (3) extraction of new translations of multiword expressions (MWEs) followed by compilation steps which create new rules for the MT engine. As a result, the lexicon is enriched with translation equivalents attested for different subject domains, which facilitates the tuning of the MT system to a specific subject domain and improves the quality and adequacy of translation.

Cited by (1)

Cited by one other publication

Laukaitis, Algirdas & Olegas Vasilecas

2007. Asymmetric Hybrid Machine Translation for Languages with Scarce Resources. In Computational Linguistics and Intelligent Text Processing [Lecture Notes in Computer Science, 4394], ► pp. 397 ff.

This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.