Article published In: Terminology
Vol. 7:2 (2001) ► pp.239–257
Automatic acquisition and classification of terminology using a tagged corpus in the molecular biology domain
Published online: 22 April 2002
https://doi.org/10.1075/term.7.2.07col
https://doi.org/10.1075/term.7.2.07col
This article describes our work to identify and classify terms in the domain of molecular biology according to examples that have been marked up by a domain expert in a corpus of abstracts taken from a controlled search of the Medline database. Automatic acquisition of biomedical term lists has so far been slow due to high variability in both the terms and their classification scheme, which we attribute to the diversity of research disciplines involved. Nevertheless, the explosive growth in online molecular biology literature makes a persuasive case for automating many tasks. This includes acquisition of records for gene-product databases such as SwissProt which are currently updated by human experts, a task that is both time consuming and often highly idiosyncratic. In this article we report results from a tool based on a hidden-Markov model for extracting and classifying terms that can be used as a key component in an information extraction system. We discuss the results in light of lexical, syntactic and semantic properties of terms that were revealed by our study.
Keywords: information extraction, molecular biology, named entity
Cited by (11)
Cited by 11 other publications
Jiang, Zhuoxuan, Yan Zhang & Xiaoming Li
Alimzhanov, Yermek & Madina Mansurova
Dorji, Tshering Cigay, El-sayed Atlam, Susumu Yata, Masao Fuketa, Kazuhiro Morita & Jun-ichi Aoe
Saneifar, Hassan, Stéphane Bonniol, Anne Laurent, Pascal Poncelet & Mathieu Roche
Sclano, F. & P. Velardi
Zhihao Yang, Hongfei Lin & Jing Zhao
Spasic, I., S. Ananiadou & J. Tsujii
Wermter, Joachim & Udo Hahn
Wermter, Joachim & Udo Hahn
Spasić, Irena & Sophia Ananiadou
This list is based on CrossRef data as of 6 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
