Automatic term recognition based on statistics of compound nouns and their components

Nakagawa, Hiroshi; Mori, Tatsunori

doi:10.1075/term.9.2.04nak

Article published In: Terminology
Vol. 9:2 (2003) ► pp.201–219

Get fulltext from our e-platform

Download PDF

Automatic term recognition based on statistics of compound nouns and their components

Hiroshi Nakagawa

Tatsunori Mori

Published online: 4 February 2004

https://doi.org/10.1075/term.9.2.04nak

In this paper, we propose a new approach to enhance automatic recognition systems for domain-specific terms. The approach is based on the statistics about the relation between a compound noun and its constituents that are simple nouns. More precisely, we focus on how many nouns adjoin the noun in question to form compound nouns. We propose several scoring methods based on this approach and experimentally evaluate them on the NTCIR1 TMREC test collection. The results are very promising, especially in low and high recall.

Keywords: domain terminology, extraction, compound nouns, simple nouns, statistics

Cited by (47)

Cited by 47 other publications

Order by:

Nugumanova, Aliya, Darkhan Akhmed-Zaki, Madina Mansurova, Yerzhan Baiburin & Almasbek Maulit

2022. NMF-based approach to automatic term extraction. Expert Systems with Applications 199 ► pp. 117179 ff.

Erjavec, Tomaž, Darja Fišer & Nikola Ljubešić

2021. The KAS corpus of Slovenian academic writing. Language Resources and Evaluation 55:2 ► pp. 551 ff.

Kimura, Yusuke, Kazuma Kusu, Kenji Hatano & Tokiya Baba

2021. Automatic Terminology Extraction Using a Dependency-Graph in NLP. In Innovations in Bio-Inspired Computing and Applications [Advances in Intelligent Systems and Computing, 1372], ► pp. 411 ff.

Kaļiņina, Irina

2020. Specifics of Translating Osteopathic Terminology from English into Latvian. Baltic Journal of English Language, Literature and Culture 10 ► pp. 54 ff.

Hautasaari, Ari, Takeo Hamada, Kuntaro Ishiyama & Shogo Fukushima

2019. VocaBura. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3:4 ► pp. 1 ff.

Ljubešić, Nikola, Darja Fišer & Tomaž Erjavec

2019. KAS-term: Extracting Slovene Terms from Doctoral Theses via Supervised Machine Learning. In Text, Speech, and Dialogue [Lecture Notes in Computer Science, 11697], ► pp. 115 ff.

Liu, Ying, Tianlin Zhang, Pei Quan, Yueran Wen, Kaichao Wu & Hongbo He

2018. A Novel Parsing-Based Automatic Domain Terminology Extraction Method. In Computational Science – ICCS 2018 [Lecture Notes in Computer Science, 10862], ► pp. 796 ff.

Riedl, Martin & Chris Biemann

2018. Using Semantics for Granularities of Tokenization. Computational Linguistics 44:3 ► pp. 483 ff.

Sivashankari, R. & B. Valarmathi

2018. NLP-MTFLR: Document-Level Prioritization and Identification of Dominant Multi-word Named Products in Customer Reviews. Arabian Journal for Science and Engineering 43:2 ► pp. 843 ff.

Yoshizawa, Go, Rinie van Est, Daisuke Yoshinaga, Mikihito Tanaka, Ryuma Shineha & Akihiko Konagaya

2018. <b>Responsible innovation in molecular robotics in Japan </b>. Chem-Bio Informatics Journal 18:0 ► pp. 164 ff.

de Santiago González, Paula & Larisa Grcic Simeunovic

2017. The Polymorphic Behaviour of Adjectives in Terminography. Meta 62:1 ► pp. 201 ff.

Seneviratne, Dilesha, Shlomo Geva, Guido Zuccon & Andrew Trotman

2017. Proceedings of the 22nd Australasian Document Computing Symposium, ► pp. 1 ff.

Kolesnikova, Olga & Alexander Gelbukh

2015. Measuring Non-compositionality of Verb-Noun Collocations Using Lexical Functions and WordNet Hypernyms. In Advances in Artificial Intelligence and Its Applications [Lecture Notes in Computer Science, 9414], ► pp. 3 ff.

Kolesnikova, Olga & Alexander Gelbukh

2017. Characteristics of Most Frequent Spanish Verb-Noun Combinations. In Advances in Computational Intelligence [Lecture Notes in Computer Science, 10061], ► pp. 27 ff.

Le, Tho Thi Ngoc, Kiyoaki Shirai, Minh Le Nguyen & Akira Shimazu

2015. Extracting indices from Japanese legal documents. Artificial Intelligence and Law 23:4 ► pp. 315 ff.

Mezghanni, Imen Bouaziz & Faiez Gargouri

2015. 2015 5th International Conference on Information & Communication Technology and Accessibility (ICTA), ► pp. 1 ff.

Takase, Haruhiko, Hiroharu Kawanaka & Shinji Tsuruoka

2015. Supporting System for Quiz in Large Class – Automatic Keyword Extraction and Browsing Interface –. Journal of Advanced Computational Intelligence and Intelligent Informatics 19:1 ► pp. 152 ff.

Domoto, Kentaro, Takehito Utsuro, Naoki Sawada & Hiromitsu Nishizaki

2014. Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific, ► pp. 1 ff.

Takada, Tomoki, Mizuki Arai & Tomohiro Takagi

2014. Automatic Keyword Annotation System Using Newspapers. Journal of Advanced Computational Intelligence and Intelligent Informatics 18:3 ► pp. 340 ff.

Bagheri, Ayoub, Mohamad Saraee & Franciska de Jong

2013. Care more about customers: Unsupervised domain-independent aspect detection for sentiment analysis of customer reviews. Knowledge-Based Systems 52 ► pp. 201 ff.

Bagheri, Ayoub, Mohamad Saraee & Franciska de Jong

2013. An Unsupervised Aspect Detection Model for Sentiment Analysis of Reviews. In Natural Language Processing and Information Systems [Lecture Notes in Computer Science, 7934], ► pp. 140 ff.

Yasuda, Keiji & Eiichiro Sumita

2013. Building a Bilingual Dictionary from a Japanese-Chinese Patent Corpus. In Computational Linguistics and Intelligent Text Processing [Lecture Notes in Computer Science, 7817], ► pp. 276 ff.

Hayama, Tessai & Susumu Kunifuji

2012. Information Provision Modules to Support Creation of Slides with Easily Understandable Presentation. International Journal of Knowledge and Systems Science 3:3 ► pp. 26 ff.

Kageura, Kyo & Ryo Murayama

2012. QRpac: User-Driven Archiving of Parallel and Comparable Documents from the Web. In The Outreach of Digital Libraries: A Globalized Resource Network [Lecture Notes in Computer Science, 7634], ► pp. 321 ff.

Choi, Yun-Soo, Sa-Kwang Song, Hong-Woo Chun, Chang-Hoo Jeong & Sung-Pil Choi

2011. Terminology Recognition System based on Machine Learning for Scientific Document Analysis. The KIPS Transactions:PartD 18D:5 ► pp. 329 ff.

Kang, Jingjing, Tao Liu, He Hu & Xiaoyong Du

2011. 2011 Sixth Annual Chinagrid Conference, ► pp. 60 ff.

Song, Sa-Kwang, Yun-Soo Choi, Hong-Woo Chun, Chang-Hoo Jeong, Sung-Pil Choi & Won-Kyung Sung

2011. Multi-words Terminology Recognition Using Web Search. In U- and E-Service, Science and Technology [Communications in Computer and Information Science, 264], ► pp. 233 ff.

Francesconi, Enrico, Simonetta Montemagni, Wim Peters & Daniela Tiscornia

2010. Integrating a Bottom–Up and Top–Down Methodology for Building Semantic Resources for the Multilingual Legal Domain. In Semantic Processing of Legal Texts [Lecture Notes in Computer Science, 6036], ► pp. 95 ff.

Kang, Jingjing, Xiaoyong Du, Tao Liu & He Hu

2010. Automatic Domain Terminology Extraction Using Graph Mutual Reinforcement. In Web-Age Information Management [Lecture Notes in Computer Science, 6184], ► pp. 656 ff.

Ureña Gómez-Moreno, José Manuel & Pamela Faber

2010. Strategies for the Semi-Automatic Retrieval of Metaphorical Terms. Metaphor and Symbol 26:1 ► pp. 23 ff.

Yoshida, Minoru, Masaki Ikeda, Shingo Ono, Issei Sato & Hiroshi Nakagawa

2010. Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, ► pp. 10 ff.

boulaknadel, Siham, Beatrice daille & Aboutajdine driss

2008. 2008 IEEE Symposium on Computers and Communications, ► pp. 869 ff.

Fan, Teng-Kai & Chia-Hui Chang

2008. 2008 The Eighth IAPR International Workshop on Document Analysis Systems, ► pp. 574 ff.

Ono, Shingo, Issei Sato, Minoru Yoshida & Hiroshi Nakagawa

2008. Person Name Disambiguation in Web Pages Using Social Network, Compound Words and Latent Topics. In Advances in Knowledge Discovery and Data Mining [Lecture Notes in Computer Science, 5012], ► pp. 260 ff.

Uchiyama, Kiyoko, Shunsuke Aihara & Shun Ishizaki

2008. Identifying Semantic Relations in Japanese Compound Nouns for Patent Documents Analysis. In Large-Scale Knowledge Resources. Construction and Application [Lecture Notes in Computer Science, 4938], ► pp. 75 ff.

Fukuhara, Tomohiro, Toshihiro Murayama & Toyoaki Nishida

2007. Analyzing concerns of people from Weblog articles. AI & SOCIETY 22:2 ► pp. 253 ff.

Horyu, Daisuke & Seishi Ninomiya

2007. Additional Selection of Extracted Terms for a Specific Area. Agricultural Information Research 16:2 ► pp. 52 ff.

Kida, Mitsuhiro, Masatsugu Tonoike, Takehito Utsuro & Satoshi Sato

2007. Domain classification of technical terms using the Web. Systems and Computers in Japan 38:14 ► pp. 11 ff.

Matsuo, Yutaka, Junichiro Mori, Masahiro Hamasaki, Takuichi Nishimura, Hideaki Takeda, Koiti Hasida & Mitsuru Ishizuka

2007. POLYPHONET: An Advanced Social Network Extraction System from the Web. SSRN Electronic Journal

Utsuro, Takehito, Masatsugu Tonoike, Satoshi Sato & Sadao Kurohashi

2007. Second International Conference on Informatics Research for Development of Knowledge Society Infrastructure (ICKS'07), ► pp. 27 ff.

Namatame, Miki, Yasushi Harada, Fusako Kusunoki, Shigenori Inagaki & Takao Terano

2006. Proceedings of the 2006 ACM SIGCHI international conference on Advances in computer entertainment technology, ► pp. 51 ff.

Utsuro, Takehito, Mitsuhiro Kida, Masatsugu Tonoike & Satoshi Sato

2006. Towards Automatic Domain Classification of Technical Terms: Estimating Domain Specificity of a Term Using the Web. In Information Retrieval Technology [Lecture Notes in Computer Science, 4182], ► pp. 633 ff.

Utsuro, Takehito, Mitsuhiro Kida, Masatsugu Tonoike & Satoshi Sato

2006. Collecting Novel Technical Terms from the Web by Estimating Domain Specificity of a Term. In Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead [Lecture Notes in Computer Science, 4285], ► pp. 173 ff.

Pazienza, Maria Teresa, Marco Pennacchiotti & Fabio Massimo Zanzotto

2005. Terminology Extraction: An Analysis of Linguistic and Statistical Approaches. In Knowledge Mining [Studies in Fuzziness and Soft Computing, 185], ► pp. 255 ff.

Spasic, I., S. Ananiadou & J. Tsujii

2005. MaSTerClass: a case-based reasoning system for the classification of biomedical terms. Bioinformatics 21:11 ► pp. 2748 ff.

Yoshida, Minoru & Hiroshi Nakagawa

2005. Automatic Term Extraction Based on Perplexity of Compound Words. In Natural Language Processing – IJCNLP 2005 [Lecture Notes in Computer Science, 3651], ► pp. 269 ff.

[no author supplied]

2022. Theoretical Perspectives on Terminology [Terminology and Lexicography Research and Practice, 23],

This list is based on CrossRef data as of 6 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.