Article published In: Computational Terminology
Edited by Ayla Rigouts Terryn and Patrick Drouin
[Terminology 31:1] 2025
► pp. 110–135
Impact of automatic term extraction on terminology work
A qualitative interview study in institutional settings
Available under the Creative Commons Attribution (CC BY) 4.0 license.
For any use beyond this license, please contact the publisher at rights@benjamins.nl.
Published online: 23 May 2025
https://doi.org/10.1075/term.00085.wis
https://doi.org/10.1075/term.00085.wis
Abstract
A crucial task in any type of terminology work is identifying and extracting terms from relevant sources, which
can be done manually or via (semi-)automatic term extraction processes. Given the recent advances in automatic term extraction
(ATE) research, this paper explores the impact of ATE on terminology work in institutional settings (academic institutions,
administrations, European institutions and international organizations) based on qualitative data. The analysis of 15
semi-structured expert interviews conducted in 2023 shows that the newest advances in research in ATE have not had an immediate
impact on terminology practices in institutional settings for the study participants. This paper aims to discuss the reasons for
the slow uptake of ATE in institutional settings, such as the gap between ATE tools developed in research and ATE components
integrated in off-the-shelf terminology or corpus management systems, the lack of integration into existing workflows, the lack of
support for certain languages, especially for less-resourced languages, as well as reasons related to source materials.
Article outline
- 1.Introduction
- 2.Tools used for Automatic Term Extraction
- 3.Automatic term extraction in the context of institutional terminology practices
- 4.Method
- 5.Results from the interview data
- 5.1Approaches
- 5.2Tools
- 5.3Integration into workflow
- 5.4Satisfaction with the results of ATE
- 5.5Contribution to the development of ATE methods
- 5.6User training for term extraction tools
- 5.7Reasons for not using automatic term extraction
- 6.Discussion
- 7.Concluding remarks
- Acknowledgments
- Notes
References
References (47)
Andersen, Gisle. 2022. “Utilising
heterogeneous language resources for term extraction in maritime domains.” Terminology
International Journal of Theoretical and Applied Issues in Specialized
Communication 28 (1): 1–36.
Andersen, Gisle, and Peder Gammeltoft. 2022. “The
Role of CLARIN in Advancing Terminology: The Case of Termportalen — the National Terminology Portal for
Norway.” In CLARIN: The Infrastructure for Language
Resources, ed. by Darja Fišer and Andreas Witt, 249–274. Berlin, Boston: De Gruyter.
Anthony, Laurence. 2013. “Developing
AntConc for a new generation of corpus linguists.” In Proceedings of
the Corpus Linguistics Conference (CL
2013), 14–16. Lancaster: Lancaster University.
Chiocchetti, Elena, and Natascia Ralli. 2012. Deliverable
D3.2 Report Workflow Adaptation for LISE. [URL]
Chiocchetti, Elena, Natascia Ralli, and Tanja Wissik. 2014. Terminology
workflows in theory and practice. In Proceedings of the 19th
European Symposium on Languages for Special Purposes. “Languages for Special Purposes in a Multilingual, Transcultural
World”, 8–10 July 2013, Vienna,
Austria, 525–535. Vienna: University of Vienna.
Chiocchetti, Elena, Vesna Lušicky, and Tanja Wissik. 2023. „Multilingual
Legal Terminology Databases. Workflows and Roles.“ In Handbook of
Terminology, vol 3: Legal Terminology, ed. by Łucja Biel and Hendrik J. Kockaert, 458–484. Amsterdam/Philadelphia: John Benjamins.
Daille, Béatrice. 2017. Term
Variation in Specialized
Corpora. Amsterdam/Philadelphia: John Benjamins.
Di Nunzio, Giorgio Maria, Stefano Marchesin, Gianmaria Silvello. 2023. “A
systematic review of Automatic Term Extraction: What happened in 2022? Digital Scholarship in
the
Humanities 38 (Supplement_1): i41–i47.
Dobrina, Claudia. 2015. “Getting
to the core of terminological projects.” In Handbook of Terminology,
vol. 1, ed. by Hendrik Kockaert and Frieda Steurs, 180–199. Amsterdam/Philadelphia: John Benjamins.
Drewer, Petra, and Klaus-Dirk Schmitz. 2017. Terminologiemanagement.
Grundlagen — Methoden —
Werkzeuge. Berlin: Springer.
Drouin, Patrick. 2003. “Term
extraction using non-technical corpora as a point of leverage.” Terminology International
Journal of Theoretical and Applied Issues in Specialized
Communication 9 (1): 99–115.
Frantzi, Katerina, Sophia Ananiadou, and Hideki Mima. 2000. “Automatic
recognition of multi-word terms.” International Journal of Digital
Libraries 3 (2), 117–132.
Frérot, Cécile, and Cristina Valentini. 2020. “Constitution
d’un corpus de contextes définitoires dans le domaine de la propriété intellectuelle: vers la définition de structures
linguistiques dans les brevets.” In Terminologie & Ontologie:
Théories et Applications. Actes de la conférence TOTh
2020, 283–306. Chambery: Presses Universitaires Savoie Mont Blanc.
Gervais, Dan. 2003. MultiTrans™
System Presentation Translation Support and Language Management
Solutions. In Proceedings of Machine Translation Summit IX: System
Presentations. September 23–27,
2003. New Orleans, USA. [URL]
Giagkou, Maria, Teresa Lynn, Jane Dunne, Stelios Piperidis, and Georg Rehm. 2023. European
Language Technology in 2022/2023. In: European Language
Equality, ed. by Georg Rehm and Andy Way, 75–93. Heidelberg/New York/Dordrecht/London: Springer.
Gius, Evelyn, Jan Christoph Meister, Malte Meister, Marco Petris, Mareike Schumacher, and Dominik Gerstorfer. 2023. CATMA
7 (Version 7.0). Zenodo.
Guest, Greg, Arwen Bunce, and Laura Johnson. 2006. “How
many interviews are enough? An experiment with data saturation and variability”. Field
Methods 18 (1), 59–82.
Haddad Haddad, Amal, Ayla Rigouts Terryn, Ruslan Mitkov, Reinhard Rapp, Pierre Zweigenbaum, and Serge Sharoff (eds). 2023. Proceedings
of the Workshop on Computational Terminology in NLP and Translation Studies (ConTeNTS) Incorporating the 16th Workshop on
Building and Using Comparable Corpora (BUCC). Varna, Bulgaria. Shoumen: INCOMA Ltd.
Hazem, Amir, Mérieme Bouhandi, Florian Boudin, and Beatrice Daille. 2020. “TermEval
2020: TALN-LS2N System for Automatic Term Extraction.” In Proceedings
of the 6th International Workshop on Computational
Terminology, 95–100. Marseille, France. European Language Resources Association. [URL]
Heylen, Kirs, Dirk De Hertog. 2015. “Automatic
Term Extraction.” In Handbook of Terminology, vol.
1, ed. by Hendrik Kockaert and Frieda Steurs, 204–221. Amsterdam/Philadelphia: John Benjamins.
International Organisation for
Standardisation. 2025. Management of terminology resources — Terminology
extraction (ISO Standard No. ISO 5078:2025 (en)).
Jakubíček, Miloš, Adam Kilgarriff, Vojtěch Kovář, Pavel Rychlý, and Vít Suchomel. 2014. “Finding
Terms in Corpora for Many Languages with the Sketch
Engine.” In Proceedings of the Demonstrations at the 14th Conference
of the European Chapter of the Association for Computational
Linguistics, 53–56. Gothenburg, Sweden: Association for Computational Linguistics.
Jakubíček, Miloš, Ondřej Matuška and Marek Blahuš. 2023. “Corpus-based
Bilingual Terminology Extraction using One-Click Terms”. In Book of
Abstracts of the twelfth international Corpus Linguistics Conference
(CL2023). Lancaster: University of Lancaster.
Janke, Regine. 2013. Anforderungen
an die Terminologieextraktion: Eine vergleichende Untersuchung der Bedürfnisse von Terminologen, Technischen Fachübersetzern
und Technischen
Redakteuren. Stuttgart: tcworld.
Jemec Tomazin, Mateja, Mitja Trojar, Simon Atelšek, Tanja Fajfar, Tomaž Erjavec and Mojca Žagar Karer. 2021. Corpus
of term-annotated texts RSDO5 1.1. Slovenian language resource repository
CLARIN.SI. [URL]
Joshi, Pratik, Sebastin Santy, Amar Budhiraja, Kalika Bali, and Monojit Choudhury. 2020. “The
State and Fate of Linguistic Diversity and Inclusion in the NLP
World.” In Proceedings of the 58th Annual Meeting of the Association
for Computational
Linguistics, 6282–6293. Association for Computational Linguistics.
Kageura, Kyo and Bin Umino. 1996. “Methods
of automatic term recognition. A review.” Terminology. International Journal of Theoretical and
Applied Issues in Specialized
Communication 3 (2): 259–289.
Kageura, Kyo and Elisabeth Marshman. 2019. “Terminology
extraction and management.” In The Routledge Handbook of Translation
and Technology, ed. by Minako O’Hagan, 61–771. London: Routledge.
Kilgarriff, Adam, Vít Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, and Vít Suchomel. 2014. “The
Sketch Engine: ten years
on.” Lexicography, 11: 7–36.
Lefever, Els, and Ayla Rigouts Terryn. 2024. Computational
Terminology. In New Advances in Translation Technology. Applications
and Pedagogy, ed. By Yuhong Pen, Huihui Huan and Efeng Li, 141–159. Singapore: Springer.
Meuser, Michael, and Ulrike Nagel. 1991. “Experteninterviews
— vielfach erprobt, wenig bedacht.” In Qualitative-empirische
Sozialforschung. Konzepte, Methoden, Analysen, ed. by Detlef Gerz, and Klaus Karaimer, 441–471. Opladen: Westdeutscher Verlag.
Nicholas, Gabriel, and Aliya Bhatia. 2023. Lost
in Translation: Large Language Models in Non-English Content Analysis. Washington D.C.: Center for Democracy & Technology. ([URL] (accessed 10.03.2024)
Rehm, Georg and Hans Uszkoreit (eds). 2012. META-NET
White Paper Series: Europe’s Languages in the Digital Age. Heidelberg/New York/Dordrecht/London: Springer. 31 volumes on 30
European languages. ([URL])
Rigouts Terryn, Ayla, Véronique Hoste, Els Lefever. 2020. In
no uncertain terms: a dataset for monolinugal and multilingual automatic term extraction from comparable
corpora. Language Resources &
Evaluation (2020) 541: 385–418.
Rigouts Terryn, Ayla, Véronique Hoste, and Els Lefever. 2022a. “Tagging
terms in text. A supervised sequential labelling approach to automatic term
extraction.” Terminology. International Journal of Theoretical and Applied Issues in
Specialized
Communication 28 (1): 157–189.
. 2022b. “D-Terminer:
Online Demo for Monolingual and Bilingual Automatic Term
Extraction.” In Proceedings of the Workshop on Terminology in the
21st Century: Many Faces, Many Places, co-located with the LREC 2022
conference, 33–40. European Language Resources Association (ELRA).
Rigouts Terryn, Ayla. 2022c. ACTER
(Annotated Corpora for Term Extraction Research) v1.5, Eurac Research CLARIN
Centre, [URL]
Šajatović, Antonio, Maja Buljan, Jan Šnajder, and Bojana Dalbelo Bašić. 2019. “Evaluating
Automatic Term Extraction Methods on Individual
Documents.” In Proceedings of the Joint Workshop on Multiword
Expressions and WordNet (MWE-WN
2019), 149–154, Florence, Italy. Association for Computational Linguistics.
Scott, Mike. 2008. “Developing
WordSmith.” In Special Issue of International Journal of English
Studies Monograph: Software-aided Analysis of
Language 8 (1): 95–106, ed.
by Pascual Pérez-Paredes, Mike Scott, and Purificación Sánchez-Hernández.
Steurs, Frieda, Ken De Wachter and Evy De Malsche. 2015. “Terminology
tools.” In Handbook of Terminology, vol. 1, ed.
by Hendrik Kockaert and Frieda Steurs, 222–249. Amsterdam/Philadelphia: John Benjamins.
Tran, Hanh Thi Hong, Matej Martinc, Jaya Caporusso, Antoine Doucet and Senja Pollak. 2022. The
Recent Advances in Automatic Term Extraction: A
survey. Preprint. Submitted to ACM [URL]
Valentini, Cristina, Geoffrey Westgate, and Philippe Rouquet. 2016. “The
PCT Termbase of the World Intellectual Property Organization: Designing a database for multilingual patent
terminology.” Terminology, 22(2): 171–200.
Warburton, Kara. 2022. The
Corporate Terminologist. Amsterdam/Philadelphia: John Benjamins.10.1075/tlrp.21
Wissik, Tanja. 2024. “Dimensions
of sustainability in terminology practice in institutional settings.” Terminology Science &
Research / Terminologie: Science et
Recherche 271: 93–116.
Žagar Karer, Mojca, and Tanja Fajfar. 2023. “Terminological
problems of terminology users: Analysis of questions in terminological counselling service on the Terminologišče
website.” Terminology. International Journal of Theoretical and Applied Issues in Specialized
Communication 29 (2): 78–102.
Zorrilla-Agut, Paula, and Thierry Fontenelle. 2019. “IATE
2. Modernising the EU’s IATE terminological database to respond to the challenges of today’s translation world and
beyond.” Terminology International Journal of Theoretical and Applied Issues in Specialized
Communication 25 (2): 146–174.
