Article published In: Computational terminology and filtering of terminological information
Edited by Patrick Drouin, Natalia Grabar, Thierry Hamon, Kyo Kageura and Koichi Takeuchi
[Terminology 24:1] 2018
► pp. 23–40
Distributed specificity for automatic terminology extraction
Published online: 31 May 2018
https://doi.org/10.1075/term.00012.amj
https://doi.org/10.1075/term.00012.amj
Abstract
The present article explores two novel methods that integrate distributed representations with terminology extraction. Both methods assess the specificity of a word (unigram) to the target corpus by leveraging its distributed representation in the target domain as well as in the general domain. The first approach adopts this distributed specificity as a filter, and the second directly applies it to the corpus. The filter can be mounted on any other Automatic Terminology Extraction (ATE) method, allows merging any number of other ATE methods, and achieves remarkable results with minimal training. The direct approach does not perform as high as the filtering approach, but it reemphasizes that using distributed specificity as the words’ representation, very little data is required to train an ATE classifier. This encourages more minimally supervised ATE algorithms in the future.
Article outline
- 1.Introduction
- 2.Related work
- 3.Corpus
- 4.Methodology
- 4.1Specificity vector
- 4.2Filtering approach
- 4.3Direct approach
- 5.Annotation
- 6.Experiments and results
- 7.Conclusion
- 8.Future work
- Notes
References
References (37)
Anthony, Laurence. 2012. AntConc (Version 3.3.0) [Computer Software]. Tokyo, Japan: Waseda University ([URL]). Accessed 12 February 2018.
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. “Enriching Word Vectors with Subword Information.” Transactions of the Association for Computational Linguistics (TACL) 51: 135–147.
Broß, Jurgen, and Heiko Ehrig. 2013. “Terminology Extraction Approaches for Product Aspect Detection in Customer Reviews.” In Proceedings of the Seventeenth Conference on Computational Natural Language Learning, CoNLL 2013, ed. by Julia Hockenmaier and Sebastian Riedel, 222–230, Vancouver, BC, Canada.
Cabré-Castellvi, Maria Teresa, Rosa Estopa Bagot, and Jordi Vivaldi-Palatresi. 2001. “Automatic Term Detection: A Review of Current Systems.” In Recent Advances in Computational Terminology, ed. by D. Bourigault, C. Jacquemin, and M. C. L’Homme, 53–87, Amsterdam/Philadephia: John Benjamins.
Chung, Teresa Mihwa. 2003. “A Corpus Comparison Approach for Terminology Extraction.” Terminology 9(2): 221–246.
Chung, Teresa Mihwa, and Paul Nation. 2004. “Identifying Technical Vocabulary.” System 32(2): 251–263.
Conrado, Merley, Thiago Pardo, and Solange Rezende. 2013. “A Machine Learning Approach to Automatic Term Extraction Using a Rich Feature Set”. In Proceedings of the NAACL HLT 2013 Student Research Workshop, 16–23, Atlanta, GA.
Crippin, Peter, Robert Donato, and David Wright. 2007. Calculus and Vectors. Toronto, ON, Canada: Nelson Education Limited.
Drouin, Patrick. 2003. “Term Extraction Using Non-Technical Corpora as a Point of Leverage”. Terminology, 9(1): 99–115.
Frantzi, Katerina T., Sophia Ananiadou, and Jun-ichi Tsujii. 1998. “The c-value/nc-value Method of Automatic Recognition for Multi-word Terms”. In Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries, ECDL’98, 585–604, London, UK: Springer-Verlag.
Inkpen, Diana, T. Sima Paribakht, Farahnaz Faez, and Ehsan Amjadian. 2016. “Term Evaluator: A Tool for Terminology Annotation and Evaluation”. International Journal of Computational Linguistics and Applications (7) 21: 145–165.
Ismail, Azniah, and Suresh Manandhar. 2010. “Bilingual Lexicon Extraction from Comparable Corpora Using in Domain Terms.” In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, COLING ’10, 481–489, Stroudsburg, PA.
Kageura, Kyo, and Bin Umino. 1996. “Methods of Automatic Term Recognition: A Review.” Terminology 3(2): 259–289.
Kirkpatrick, Chris, Barbara Alldred, Crystal Chilvers, Beverly Farahani, Kristina Farentino, Angelo Lillo, Ian Macpherson, John Rodger, and Susanne Trew. 2007. Nelson Advanced Functions. Toronto, ON, Canada: Nelson Education.
Le Serrec, Annaïch, Marie-Claude L’Homme, Patrick Drouin, and Olivier Kraif. 2010. “Automating the Compilation of Specialized Dictionaries Use and Analysis of Term Extraction and Lexical Alignment.” Terminology 16 (1): 77–107.
Ljubesic, Nikola, Spela Vintar, and Darja Fiser. 2012. “Multi-word Term Extraction from Comparable Corpora by Combining Contextual and Constituent Clues”. In Proceedings of 5th Workshop on Building and Using Comparable Corpora (BUCC 2012), 143–147, Istanbul, Turkey.
Mikolov, Thomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Efficient Estimation of Word Representations in Vector Space.” In arXiv preprint arXiv:1301.3781 ([URL]). Accessed 10 February 2018.
Mitkov, Ruslan, Richard Evans, Constantin Orasan, Iustin Dornescu, and Miguel Rios, 2012. “Coreference Resolution: To What Extent Does It Help NLP Applications?”. In Text, Speech and Dialogue. TSD 2012. Lecture Notes in Computer Science, vol. 74991, 179–190. Berlin, Heidelberg: Springer.
Mnih, Andriy, and Koray Kavukcuoglu. 2013. “Learning Word Embeddings Efficiently with Noise-contrastive Estimation.” In Advances in Neural Information Processing Systems, ed. by C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, 261: 2265–2273. Red Hook, NY, USA: Curran Associates, Inc.
Nazar, Rogelio, and Maria Teresa Cabré. 2012. “A Machine Learning Approach to Automatic Term Extraction Using a Rich Feature Set.” In Proceedings of the 10th Terminology and Knowledge Engineering Conference, 209–217, Madrid, Spain.
Park, Youngja, Roy J. Byrd, and Branimir K. Boguraev. 2002. “Automatic Glossary Extraction: Beyond Terminology Identification.” In Proceedings of the 19th International Conference on Computational Linguistics, 1–7, Morristown, NJ.
Pennington, Jeffrey, Richard Socher, and Christopher D. Manning. 2014. “Glove: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP 2014), 1532–1543, Doha, Qatar.
Platt, John. 1998. “Fast Training of Support Vector Machines using Sequential Minimal Optimization.” In Advances in Kernel Methods – Support Vector Learning, ed. by B. Schoelkopf, C. Burges, and A. Smola, 41–64, Cambridge: MIT Press.
Pontiki, Maria, Dimitris Galanis, John Pavlopoulos, Harris Papageorgiou, Ion Androutsopoulos, and Suresh Manandhar. 2014. “Semeval-2014 Task 4: Aspect-based Sentiment Analysis.” In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), 27–35, Dublin, Ireland.
Pontiki, Maria, Dimitris Galanis, Haris Papageorgiou, Suresh Manandhar, and Ion Androutsopoulos. 2015. “Semeval-2015 Task 12: Aspect-based Sentiment Analysis.” In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), 486–495, Denver, Colorado.
Rehurek, Radim and Petr Sojka. 2010. “Software Framework for Topic Modelling with Large Corpora.” In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, 45–50, Valletta, Malta.
Small, Marian, Chris Kirkpatrick, B. Alldred, S. Godin, Angelo Lillo, and Andrew Dmytriw. 2007a. “Functions 11”. Toronto, ON, Canada: Nelson Education Limited.
Small, Marian, Chris Kirkpatrick, and Andrew Dmytriw. 2007b. Functions and Applications 11. Nelson Education Limited.Small, Marian, C. Kirkpatrick, D. Zimmer, C. Chilvers, S. DAgostino, D. Duff, K. Farentino, I. Macpherson, J. Tonner, J. Williamson, and T. A. Yeager. 2005. Principles of Mathematics 9. Toronto, ON, Canada; Nelson Education Limited.
Su Nam, Kim, Timothy Baldwin, and Min-Yen Kan. 2009. “An Unsupervised Approach to Domain-Specific Term Extraction.” In Proceedings of the Australasian Language Technology Association Workshop 2009, 94–99, Sydney, Australia.
Turney, Peter D. 2000. ”Learning Algorithms for Keyphrase Extraction.” Information Retrieval 2(4): 303–336.
Vintar, Spela. 2010. “Bilingual Term Recognition Revisited: The Bag-of-equivalents Term Alignment Approach and its Evaluation”. Terminology 16(2): 141–158.
Vu, Thuy, Ai Ti Aw, and Min Zhang. 2008. “Term Extraction through Unithood and Termhood Unification.” In Proceedings of the International Joint Conference on Natural Language Processing, 631–636, Hyderabad, India.
Wang, Rui, Wei Liu, and Chris McDonald. 2015. “Corpus-independent Generic Keyphrase Extraction Using Word Embedding Vectors.” In Proceedings of the Workshop on Deep Learning for Web Search and Data Mining. 1–8, Shanghai, China.
Yang, Yuhang, Hao Yu, Yao Meng, Yingliang Lu, and Yingju Xia. 2010. “Fault-tolerant Learning for Term Extraction.” In Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation (
PACLIC 2010
), ed. by Ryo Otoguro, Kiyoshi Ishikawa, Hiroshi Umemoto, Kei Yoshimoto, and Yasunari Harada, 321–330, Sendai, Japan
Yin, Yichun, Furu Wei, Li Dong, Kaimeng Xu, Ming Zhang, and Ming Zhou. 2016. “Unsupervised Word and Dependency Path Embeddings for Aspect Term Extraction.” In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16). 2979–2985, New York, NY.
Cited by (11)
Cited by 11 other publications
Delaunay, Julien, Hanh Thi Hong Tran, Carlos-Emiliano González-Gallardo, Georgeta Bordea, Mathilde Ducos, Nicolas Sidere, Antoine Doucet, Senja Pollak & Olivier De Viron
Lefever, Els & Ayla Rigouts Terryn
McDonnell, Serena, Omar Nada, Nicholas Prayogo, Preston Engstrom, Muhammad Rizwan Abid, Chen Ding & Ehsan Amjadian
Prayogo, Nicholas, Ehsan Amjadian, Serena McDonnell & Muhammad Rizwan Abid
Amjadian, Ehsan, Nicholas Prayogo, Serena McDonnell, Cathal Smyth & Muhammad Rizwan Abid
Du, Jiali, Christina Alexantris & Pingfang Yu
McDonnell, Serena, Omar Nada, Muhammad Rizwan Abid & Ehsan Amjadian
Rigouts Terryn, Ayla, Véronique Hoste & Els Lefever
2021. HAMLET. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 27:2 ► pp. 254 ff.
Rigouts Terryn, Ayla, Véronique Hoste & Els Lefever
2022. Tagging terms in text. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 28:1 ► pp. 157 ff.
Isaeva, Ekaterina & Vadim Bakhtin
[no author supplied]
This list is based on CrossRef data as of 5 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
