Distributed specificity for automatic terminology extraction

Amjadian, Ehsan; Inkpen, Diana; Paribakht, T. Sima; Faez, Farahnaz

doi:10.1075/term.00012.amj

Article published In: Computational terminology and filtering of terminological information
Edited by Patrick Drouin, Natalia Grabar, Thierry Hamon, Kyo Kageura and Koichi Takeuchi
[Terminology 24:1] 2018
► pp. 23–40

Get fulltext from our e-platform

Download PDF

Distributed specificity for automatic terminology extraction

Ehsan Amjadian | Carleton University, Canada | University of Ottawa, Canada

Diana Inkpen | University of Ottawa, Canada

T. Sima Paribakht | University of Ottawa, Canada

Farahnaz Faez | Western University, Canada

Published online: 31 May 2018

https://doi.org/10.1075/term.00012.amj

Abstract

The present article explores two novel methods that integrate distributed representations with terminology extraction. Both methods assess the specificity of a word (unigram) to the target corpus by leveraging its distributed representation in the target domain as well as in the general domain. The first approach adopts this distributed specificity as a filter, and the second directly applies it to the corpus. The filter can be mounted on any other Automatic Terminology Extraction (ATE) method, allows merging any number of other ATE methods, and achieves remarkable results with minimal training. The direct approach does not perform as high as the filtering approach, but it reemphasizes that using distributed specificity as the words’ representation, very little data is required to train an ATE classifier. This encourages more minimally supervised ATE algorithms in the future.

Keywords: automatic terminology extraction, neural networks, distributed specificity, representation learning, word embeddings

Article outline

1.Introduction
2.Related work
3.Corpus
4.Methodology
- 4.1Specificity vector
- 4.2Filtering approach
- 4.3Direct approach
5.Annotation
6.Experiments and results
7.Conclusion
8.Future work
Notes
References

References (37)

References

Anthony, Laurence. 2012. AntConc (Version 3.3.0) [Computer Software]. Tokyo, Japan: Waseda University ([URL]). Accessed 12 February 2018.

Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. “Enriching Word Vectors with Subword Information.” Transactions of the Association for Computational Linguistics (TACL) 51: 135–147.

Broß, Jurgen, and Heiko Ehrig. 2013. “Terminology Extraction Approaches for Product Aspect Detection in Customer Reviews.” In Proceedings of the Seventeenth Conference on Computational Natural Language Learning, CoNLL 2013, ed. by Julia Hockenmaier and Sebastian Riedel, 222–230, Vancouver, BC, Canada.

Cabré-Castellvi, Maria Teresa, Rosa Estopa Bagot, and Jordi Vivaldi-Palatresi. 2001. “Automatic Term Detection: A Review of Current Systems.” In Recent Advances in Computational Terminology, ed. by D. Bourigault, C. Jacquemin, and M. C. L’Homme, 53–87, Amsterdam/Philadephia: John Benjamins.

Chung, Teresa Mihwa. 2003. “A Corpus Comparison Approach for Terminology Extraction.” Terminology 9(2): 221–246.

Chung, Teresa Mihwa, and Paul Nation. 2004. “Identifying Technical Vocabulary.” System 32(2): 251–263.

Conrado, Merley, Thiago Pardo, and Solange Rezende. 2013. “A Machine Learning Approach to Automatic Term Extraction Using a Rich Feature Set”. In Proceedings of the NAACL HLT 2013 Student Research Workshop, 16–23, Atlanta, GA.

Crippin, Peter, Robert Donato, and David Wright. 2007. Calculus and Vectors. Toronto, ON, Canada: Nelson Education Limited.

Drouin, Patrick. 2003. “Term Extraction Using Non-Technical Corpora as a Point of Leverage”. Terminology, 9(1): 99–115.

Frantzi, Katerina T., Sophia Ananiadou, and Jun-ichi Tsujii. 1998. “The c-value/nc-value Method of Automatic Recognition for Multi-word Terms”. In Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries, ECDL’98, 585–604, London, UK: Springer-Verlag.

Inkpen, Diana, T. Sima Paribakht, Farahnaz Faez, and Ehsan Amjadian. 2016. “Term Evaluator: A Tool for Terminology Annotation and Evaluation”. International Journal of Computational Linguistics and Applications (7) 21: 145–165.

Ismail, Azniah, and Suresh Manandhar. 2010. “Bilingual Lexicon Extraction from Comparable Corpora Using in Domain Terms.” In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, COLING ’10, 481–489, Stroudsburg, PA.

Kageura, Kyo, and Bin Umino. 1996. “Methods of Automatic Term Recognition: A Review.” Terminology 3(2): 259–289.

Kirkpatrick, Chris, Barbara Alldred, Crystal Chilvers, Beverly Farahani, Kristina Farentino, Angelo Lillo, Ian Macpherson, John Rodger, and Susanne Trew. 2007. Nelson Advanced Functions. Toronto, ON, Canada: Nelson Education.

Le Serrec, Annaïch, Marie-Claude L’Homme, Patrick Drouin, and Olivier Kraif. 2010. “Automating the Compilation of Specialized Dictionaries Use and Analysis of Term Extraction and Lexical Alignment.” Terminology 16 (1): 77–107.

Ljubesic, Nikola, Spela Vintar, and Darja Fiser. 2012. “Multi-word Term Extraction from Comparable Corpora by Combining Contextual and Constituent Clues”. In Proceedings of 5th Workshop on Building and Using Comparable Corpora (BUCC 2012), 143–147, Istanbul, Turkey.

Mikolov, Thomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Efficient Estimation of Word Representations in Vector Space.” In arXiv preprint arXiv:1301.3781 ([URL]). Accessed 10 February 2018.

Mitkov, Ruslan, Richard Evans, Constantin Orasan, Iustin Dornescu, and Miguel Rios, 2012. “Coreference Resolution: To What Extent Does It Help NLP Applications?”. In Text, Speech and Dialogue. TSD 2012. Lecture Notes in Computer Science, vol. 74991, 179–190. Berlin, Heidelberg: Springer.

Mnih, Andriy, and Koray Kavukcuoglu. 2013. “Learning Word Embeddings Efficiently with Noise-contrastive Estimation.” In Advances in Neural Information Processing Systems, ed. by C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, 261: 2265–2273. Red Hook, NY, USA: Curran Associates, Inc.

Nazar, Rogelio, and Maria Teresa Cabré. 2012. “A Machine Learning Approach to Automatic Term Extraction Using a Rich Feature Set.” In Proceedings of the 10th Terminology and Knowledge Engineering Conference, 209–217, Madrid, Spain.

Park, Youngja, Roy J. Byrd, and Branimir K. Boguraev. 2002. “Automatic Glossary Extraction: Beyond Terminology Identification.” In Proceedings of the 19th International Conference on Computational Linguistics, 1–7, Morristown, NJ.

Pennington, Jeffrey, Richard Socher, and Christopher D. Manning. 2014. “Glove: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP 2014), 1532–1543, Doha, Qatar.

Platt, John. 1998. “Fast Training of Support Vector Machines using Sequential Minimal Optimization.” In Advances in Kernel Methods – Support Vector Learning, ed. by B. Schoelkopf, C. Burges, and A. Smola, 41–64, Cambridge: MIT Press.

Pontiki, Maria, Dimitris Galanis, John Pavlopoulos, Harris Papageorgiou, Ion Androutsopoulos, and Suresh Manandhar. 2014. “Semeval-2014 Task 4: Aspect-based Sentiment Analysis.” In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), 27–35, Dublin, Ireland.

Pontiki, Maria, Dimitris Galanis, Haris Papageorgiou, Suresh Manandhar, and Ion Androutsopoulos. 2015. “Semeval-2015 Task 12: Aspect-based Sentiment Analysis.” In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), 486–495, Denver, Colorado.

Rehurek, Radim and Petr Sojka. 2010. “Software Framework for Topic Modelling with Large Corpora.” In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, 45–50, Valletta, Malta.

Small, Marian, Chris Kirkpatrick, B. Alldred, S. Godin, Angelo Lillo, and Andrew Dmytriw. 2007a. “Functions 11”. Toronto, ON, Canada: Nelson Education Limited.

Small, Marian, Chris Kirkpatrick, and Andrew Dmytriw. 2007b. Functions and Applications 11. Nelson Education Limited.Small, Marian, C. Kirkpatrick, D. Zimmer, C. Chilvers, S. DAgostino, D. Duff, K. Farentino, I. Macpherson, J. Tonner, J. Williamson, and T. A. Yeager. 2005. Principles of Mathematics 9. Toronto, ON, Canada; Nelson Education Limited.

Su Nam, Kim, Timothy Baldwin, and Min-Yen Kan. 2009. “An Unsupervised Approach to Domain-Specific Term Extraction.” In Proceedings of the Australasian Language Technology Association Workshop 2009, 94–99, Sydney, Australia.

Turney, Peter D. 2000. ”Learning Algorithms for Keyphrase Extraction.” Information Retrieval 2(4): 303–336.

Vintar, Spela. 2010. “Bilingual Term Recognition Revisited: The Bag-of-equivalents Term Alignment Approach and its Evaluation”. Terminology 16(2): 141–158.

Vu, Thuy, Ai Ti Aw, and Min Zhang. 2008. “Term Extraction through Unithood and Termhood Unification.” In Proceedings of the International Joint Conference on Natural Language Processing, 631–636, Hyderabad, India.

Wang, Rui, Wei Liu, and Chris McDonald. 2015. “Corpus-independent Generic Keyphrase Extraction Using Word Embedding Vectors.” In Proceedings of the Workshop on Deep Learning for Web Search and Data Mining. 1–8, Shanghai, China.

Yang, Yuhang, Hao Yu, Yao Meng, Yingliang Lu, and Yingju Xia. 2010. “Fault-tolerant Learning for Term Extraction.” In Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation ( PACLIC 2010 ), ed. by Ryo Otoguro, Kiyoshi Ishikawa, Hiroshi Umemoto, Kei Yoshimoto, and Yasunari Harada, 321–330, Sendai, Japan

Yin, Yichun, Furu Wei, Li Dong, Kaimeng Xu, Ming Zhang, and Ming Zhou. 2016. “Unsupervised Word and Dependency Path Embeddings for Aspect Term Extraction.” In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16). 2979–2985, New York, NY.

Yoshida, Minoru, and Hiroshi Nakagawa, 2005. “Automatic Term Extraction Based on Perplexity of Compound Words” In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP-05), 269–279, Jeju Island, Korea.

Zervanou, Kalliopi. 2010. “The Uvt Term Extraction System in the Keyphrase Extraction Task.” In Proceedings of the 5th International Workshop on Semantic Evaluation, 194–197, Uppsala, Sweden.

Cited by (11)

Cited by 11 other publications

Order by:

Delaunay, Julien, Hanh Thi Hong Tran, Carlos-Emiliano González-Gallardo, Georgeta Bordea, Mathilde Ducos, Nicolas Sidere, Antoine Doucet, Senja Pollak & Olivier De Viron

2024. CoastTerm: A Corpus for Multidisciplinary Term Extraction in Coastal Scientific Literature. In Text, Speech, and Dialogue [Lecture Notes in Computer Science, 15048], ► pp. 97 ff.

Lefever, Els & Ayla Rigouts Terryn

2024. Computational Terminology. In New Advances in Translation Technology [New Frontiers in Translation Studies, ], ► pp. 141 ff.

McDonnell, Serena, Omar Nada, Nicholas Prayogo, Preston Engstrom, Muhammad Rizwan Abid, Chen Ding & Ehsan Amjadian

2022. 2022 IEEE 13th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), ► pp. 0343 ff.

Prayogo, Nicholas, Ehsan Amjadian, Serena McDonnell & Muhammad Rizwan Abid

2022. 2022 IEEE 13th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), ► pp. 0359 ff.

Amjadian, Ehsan, Nicholas Prayogo, Serena McDonnell, Cathal Smyth & Muhammad Rizwan Abid

2021. 2021 IEEE Aerospace Conference (50100), ► pp. 1 ff.

Du, Jiali, Christina Alexantris & Pingfang Yu

2021. Towards Chinese Terminology Application of TERMONLINE. In Advances in Artificial Intelligence, Software and Systems Engineering [Lecture Notes in Networks and Systems, 271], ► pp. 190 ff.

McDonnell, Serena, Omar Nada, Muhammad Rizwan Abid & Ehsan Amjadian

2021. 2021 IEEE Aerospace Conference (50100), ► pp. 1 ff.

Rigouts Terryn, Ayla, Véronique Hoste & Els Lefever

2021. HAMLET. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 27:2 ► pp. 254 ff.

Rigouts Terryn, Ayla, Véronique Hoste & Els Lefever

2022. Tagging terms in text. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 28:1 ► pp. 157 ff.

Isaeva, Ekaterina & Vadim Bakhtin

2020. Man - Machine Knowledge Mediation: Overview of Deep Learning Methods for Natural Language Processing. In Digital Science 2019 [Advances in Intelligent Systems and Computing, 1114], ► pp. 44 ff.

[no author supplied]

2022. Theoretical Perspectives on Terminology [Terminology and Lexicography Research and Practice, 23],

This list is based on CrossRef data as of 5 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.