Article published In: Terminology: Online-First Articles
Evaluating the extraction of Italian institutional terminology
A comparative study between Sketch Engine and ChatGPT-4o
Published online: 26 February 2026
https://doi.org/10.1075/term.25041.ort
https://doi.org/10.1075/term.25041.ort
Abstract
In institutional settings, terminology management is essential to ensure efficient communication. Traditionally,
this task has been carried out manually or through the use of corpus analysis tools. However, recent advances in generative
artificial intelligence have opened new avenues for automating terminographic tasks. In this context, the use of generative models
is proposed for the extraction of specialised terminology in academic institutions. Specifically, this study compares two
approaches to terminology extraction. On the one hand, a corpus-based approach using the Sketch Engine tool and, on the other
hand, an approach based on generative artificial intelligence. To this end, UniPDTerms was implemented — a chatbot designed with
ChatGPT-4o specialised in institutional terminology of the University of Padua (Italy) and fed with an ad hoc
corpus. The evaluation of both systems was performed using a reference list and analysing precision, recall, F-score and MRR
metrics for each model. The results indicate that Sketch Engine and UniPDTerms performed at a comparable level under identical
evaluation conditions. Although the two systems use different extraction mechanisms, their outputs produce similar results: Sketch
Engine extracts relevant term candidates using frequency-based corpus analysis, whereas UniPDTerms draws on contextual and
semantic relations. These results highlight the potential of incorporating generative artificial intelligence into terminographic
workflows, offering new possibilities for improving efficiency and supporting end-users in terminology management.
Article outline
- 1.Introduction
- 2.Theoretical framework
- 2.1Institutional terminology management
- 2.2Terminology extraction process
- 2.3Evaluation of terminology extraction
- 2.4Artificial intelligence and generative chatbots in terminology management
- 3.Materials and methods
- 3.1UniPD corpus
- 3.2Terminology extraction tools
- 3.2.1Sketch engine
- 3.2.2ChatGPT-4o
- 3.3Evaluation
- 3.3.1Terminology extraction with sketch engine
- 3.3.2Terminology extraction with UniPDTerms
- 4.Results and discussion
- 4.1Quantitative analysis of terminology extraction with sketch engine and ChatGPT-4o
- 4.2Quantitative comparison between sketch engine and ChatGPT-4o
- 4.3Qualitative analysis of extracted term candidates
- 5.Conclusions
- Notes
References
References (42)
Bowker, Lynne, and Jennifer Pearson. 2002. Working
with Specialized Language: A Practical Guide to Using
Corpora. London: Routledge.
Castillo-Pérez, Esther, and Silvia Montero-Martínez. 2021. “Internationalisation
and Terminology Management in Higher Education.” In Linguistic,
Educational and Intercultural Research 2021 (LEIC Research
2021), 72–73. [URL]
Chiocchetti, Elena, Vesna Lušicky, and Tanja Wissik. 2023. “Multilingual
Legal Terminology Databases: Workflows and Roles.” In Handbook of
Terminology: Vol. 3. Legal Terminology, edited by Łucja Biel and Hendrik J. Kockaert, 458–484. Amsterdam: John Benjamins Publishing Company.
Curry, Neil, Paul Baker, and Gavin Brookes. 2024. “Generative
AI for Corpus Approaches to Discourse Studies: A Critical Evaluation of ChatGPT.” Applied
Corpus Linguistics 4 (1): 100082.
Daille, Béatrice. 2017. Term
Variation in Specialised Corpora: Characterisation, Automatic Discovery and
Applications. Amsterdam: John Benjamins Publishing Company.
de Wit, Hans. 2011. “Globalisation
and Internationalisation of Higher Education.” Revista de Universidad y Sociedad del
Conocimiento
(RUSC) 8 (2): 241–248. [URL]
Dobrina, Claudia. 2015. “Getting
to the Core of a Terminological Project.” In Handbook of
Terminology, vol. 11, edited
by Hendrik J. Kockaert and Frieda Steurs, 180–199. Amsterdam: John Benjamins.
Drouin, Patrick. 2003. “Term
Extraction Using Non-Technical Corpora as a Point of
Leverage.” Terminology 9 (1): 99–115.
Gao, Yuan, Ruili Wang, and Feng Hou. 2023. “How
to Design Translation Prompts for ChatGPT: An Empirical Study.” arXiv. [URL]
Hazem, Amir, Mérième Bouhandi, Florian Boudin, and Béatrice Daille. 2020. “TermEval
2020: TALN-LS2N System for Automatic Term Extraction.” In Proceedings
of the 6th International Workshop on Computational Terminology (COMPUTERM
2020), 95–100. Marseille: European Language Resources Association (ELRA).
Heylen, Kris, and Dirk De Hertog. 2015. “Automatic
Term Extraction.” In Handbook of
Terminology, vol. 11, edited
by Hendrik J. Kockaert and Frieda Steurs, 203–221. Amsterdam: John Benjamins Publishing Company.
Irrera, Ornella, Stefano Marchesin, and Gianmaria Silvello. 2024. “MetaTron:
Advancing Biomedical Annotation Empowering Relation Annotation and Collaboration.” BMC
Bioinformatics 25 (1): 112.
Jiao, Wenxiang, Wenxuan Wang, Jen-tse Huang, Xing Wang, Shuming Shi, and Zhaopeng Tu. 2023. “Is
ChatGPT a Good Translator? Yes with GPT-4 as the Engine.” arXiv. [URL]
Kilgarriff, Adam, Vít Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, and Vít Suchomel. 2014. “The
Sketch Engine: Ten Years
On.” Lexicography 11: 7–36.
Kilgarriff, Adam. 2009. “Simple
Maths for Keywords.” In Proceedings of Corpus Linguistics Conference
CL2009 (University of Liverpool, UK, July 2009), edited
by M. Mahlberg, V. González-Díaz, and C. Smith. [URL]
Macken, Lieve, Els Lefever, and Véronique Hoste. 2013. “TExSIS:
Bilingual Terminology Extraction from Parallel Corpora Using Chunk-Based
Alignment.” Terminology. International Journal of Theoretical and Applied Issues in Specialized
Communication 19 (1): 1–30.
Montero-Martínez, Silvia. 2025. Terminología
en instituciones académicas: Modelos de gestión en un contexto
internacional. Granada: Comares. ISBN: 978-84-1369-765-9.
. 2023a. “UGRTerm®:
Gestión Terminológica para el Ámbito Académico.” In Traducción e
Interpretación Especializadas en Ámbito
Panhispánico, vol. 601, edited
by Pilar Sorbet and Verónica D. Valle Cacela, 177–190. Berlin: Peter Lang GmbH, Internationaler Verlag der Wissenschaften. [URL]
. 2023b. “Training
Corporate and Institutional Terminologists: A Case Study at the University of Granada.” The
Interpreter and Translator
Trainer 17 (3): 412–433.
Montero-Martínez, Silvia, Pamela Faber-Benítez, and Miriam Buendía-Castro. 2011. Terminología
para Traductores e Intérpretes: Una Perspectiva Integradora. 2nd
ed. Granada: Tragacanto.
Ortiz-Garduño, Helena, and Daniel Torres-Salinas. 2025. “GPTBot
Development for Translation Purposes: Flowchart, Practical Case and Future Prospects.” Journal
of Language and
Education 11 (2): 94–110.
Ortiz-Garduño, Helena, and Victoria Di Césare. 2025. “Generative
artificial intelligence applied to terminological variation: The concept of LOCAL RESEARCH as a case study for the development
of a ChatGPT bot.” In Proceedings of the III International Conference
on Digital Linguistics (CILIDI), 1051, Valencia, Spain.
OpenAI. 2024. “Hello
GPT-4-o.” OpenAI. May 13,
2024. [URL]
. 2023. “GPT-4 Technical
Report.” arXiv. [URL]
Palomar-González, Virginia. 2004. “La
Importancia de la Normalización Terminológica.” In Actas del II
Congreso El Español, Lengua de Traducción: Las Palabras del Traductor, edited
by Luis González and Pollux Hernúñez, 67–76. Madrid: Esletra.
Pavel, Silvia. 2001. Handbook
of Terminology. Ottawa: Translation Bureau of Canada. [URL]
Pazienza, Maria Teresa, Marco Pennacchiotti, and Fabio Massimo Zanzotto. 2005. “Terminology
Extraction: An Analysis of Linguistic and Statistical
Approaches.” In Studies in Fuzziness and Soft
Computing, vol. 1851, 255–279. Berlin: Springer-Verlag. [URL]
Repar, Andraž, Vid Podpečan, Anže Vavpetič, Nada Lavrač, and Senja Pollak. 2022. “TermEnsembler.” Terminology.
International Journal of Theoretical and Applied Issues in Specialized
Communication 28 (1): 93–120.
Rigouts-Terryn, Ayla, Véronique Hoste, and Els Lefever. 2020. “In
No Uncertain Terms: A Dataset for Monolingual and Multilingual Automatic Term Extraction from Comparable
Corpora.” Language Resources and
Evaluation 54 (2): 385–418.
Rigouts-Terryn, Ayla, Véronique Hoste, Patrick Drouin, and Els Lefever. 2020. “TermEval
2020: Shared Task on Automatic Term Extraction Using the Annotated Corpora for Term Extraction Research (ACTER)
Dataset.” In Proceedings of the 6th International Workshop on
Computational Terminology (COMPUTERM
2020), 85–94. Marseille: European Language Resources Association (ELRA).
Sager, Juan C. 1990. A Practical Course in Terminology
Processing. Amsterdam: John Benjamins Publishing Company.
Sahari, Yousef, Abdu M. T. Al-Kadi, and Jamak K. M. Ali. 2023. “A
Cross-Sectional Study of ChatGPT in Translation: Magnitude of Use, Attitudes, and
Uncertainties.” Journal of Psycholinguistic
Research 52 (6): 2937–2954.
San Martín, Antonio. 2024. “What
Generative Artificial Intelligence Means for Terminological Definitions.” Paper presented at
the 3rd International Conference on Multilingual Digital Terminology Today: Design, Representation
Formats and Management Systems, 27–28 June 2024, Granada,
Spain.
Schmitz, Klaus-Dirk, and Daniela Straub. 2018. Successful
Terminology Management in Companies. Stuttgart: tcworld GmbH.
Schweizerische
Bundeskanzlei. 2018. Terminologiearbeit. DES — Konferenz der Übersetzungsdienste Europäischer Staaten. [URL]
TerminOrgs. 2016. Terminology Starter
Guide. Edited by Kara Karburton. [URL]
UNESCO. 2019. Preliminary Study on the
Ethics of Artificial Intelligence. COMEST — World Commission on the Ethics of Scientific Knowledge and Technology. [URL]
Varantola, Krista. 2002. “Disposable
Corpora as Intelligent Tools in Translation.” Cuadernos de
Traducción 91: 171–189. Available
at: [URL]
Vezzani, Federica. 2022. Terminologie
Numérique: Conception, Représentation et
Gestion. Vol. 2901. Berlin: Peter Lang Verlag.
Warburton, Kara. 2021. The
Corporate
Terminologist. Vol. 211. Amsterdam: John Benjamins. [URL].