Evaluating the extraction of Italian institutional terminology: A comparative study between Sketch Engine and ChatGPT-4o

Ortiz-Garduño, Helena

doi:10.1075/term.25041.ort

Article published In: Terminology: Online-First Articles

Get fulltext from our e-platform

Download PDF

Download EPUB

Evaluating the extraction of Italian institutional terminology

A comparative study between Sketch Engine and ChatGPT-4o

Helena Ortiz-Garduño | University of Granada

Published online: 26 February 2026

https://doi.org/10.1075/term.25041.ort

Abstract

In institutional settings, terminology management is essential to ensure efficient communication. Traditionally, this task has been carried out manually or through the use of corpus analysis tools. However, recent advances in generative artificial intelligence have opened new avenues for automating terminographic tasks. In this context, the use of generative models is proposed for the extraction of specialised terminology in academic institutions. Specifically, this study compares two approaches to terminology extraction. On the one hand, a corpus-based approach using the Sketch Engine tool and, on the other hand, an approach based on generative artificial intelligence. To this end, UniPDTerms was implemented — a chatbot designed with ChatGPT-4o specialised in institutional terminology of the University of Padua (Italy) and fed with an ad hoc corpus. The evaluation of both systems was performed using a reference list and analysing precision, recall, F-score and MRR metrics for each model. The results indicate that Sketch Engine and UniPDTerms performed at a comparable level under identical evaluation conditions. Although the two systems use different extraction mechanisms, their outputs produce similar results: Sketch Engine extracts relevant term candidates using frequency-based corpus analysis, whereas UniPDTerms draws on contextual and semantic relations. These results highlight the potential of incorporating generative artificial intelligence into terminographic workflows, offering new possibilities for improving efficiency and supporting end-users in terminology management.

Keywords: terminology extraction, generative artificial intelligence, chatbots, ChatGPT-4o, Sketch Engine

Article outline

1.Introduction
2.Theoretical framework
- 2.1Institutional terminology management
- 2.2Terminology extraction process
- 2.3Evaluation of terminology extraction
- 2.4Artificial intelligence and generative chatbots in terminology management
3.Materials and methods
- 3.1UniPD corpus
- 3.2Terminology extraction tools
  - 3.2.1Sketch engine
  - 3.2.2ChatGPT-4o
- 3.3Evaluation
  - 3.3.1Terminology extraction with sketch engine
  - 3.3.2Terminology extraction with UniPDTerms
4.Results and discussion
- 4.1Quantitative analysis of terminology extraction with sketch engine and ChatGPT-4o
- 4.2Quantitative comparison between sketch engine and ChatGPT-4o
- 4.3Qualitative analysis of extracted term candidates
5.Conclusions
Notes
References

References (42)

References

Bowker, Lynne, and Jennifer Pearson. 2002. Working with Specialized Language: A Practical Guide to Using Corpora. London: Routledge.

Castillo-Pérez, Esther, and Silvia Montero-Martínez. 2021. “Internationalisation and Terminology Management in Higher Education.” In Linguistic, Educational and Intercultural Research 2021 (LEIC Research 2021), 72–73. [URL]

Chiocchetti, Elena, Vesna Lušicky, and Tanja Wissik. 2023. “Multilingual Legal Terminology Databases: Workflows and Roles.” In Handbook of Terminology: Vol. 3. Legal Terminology, edited by Łucja Biel and Hendrik J. Kockaert, 458–484. Amsterdam: John Benjamins Publishing Company.

Curry, Neil, Paul Baker, and Gavin Brookes. 2024. “Generative AI for Corpus Approaches to Discourse Studies: A Critical Evaluation of ChatGPT.” Applied Corpus Linguistics 4 (1): 100082.

Daille, Béatrice. 2017. Term Variation in Specialised Corpora: Characterisation, Automatic Discovery and Applications. Amsterdam: John Benjamins Publishing Company.

de Wit, Hans. 2011. “Globalisation and Internationalisation of Higher Education.” Revista de Universidad y Sociedad del Conocimiento (RUSC) 8 (2): 241–248. [URL]

Dobrina, Claudia. 2015. “Getting to the Core of a Terminological Project.” In Handbook of Terminology, vol. 11, edited by Hendrik J. Kockaert and Frieda Steurs, 180–199. Amsterdam: John Benjamins.

Drouin, Patrick. 2003. “Term Extraction Using Non-Technical Corpora as a Point of Leverage.” Terminology 9 (1): 99–115.

Gao, Yuan, Ruili Wang, and Feng Hou. 2023. “How to Design Translation Prompts for ChatGPT: An Empirical Study.” arXiv. [URL]

Hazem, Amir, Mérième Bouhandi, Florian Boudin, and Béatrice Daille. 2020. “TermEval 2020: TALN-LS2N System for Automatic Term Extraction.” In Proceedings of the 6th International Workshop on Computational Terminology (COMPUTERM 2020), 95–100. Marseille: European Language Resources Association (ELRA).

Heylen, Kris, and Dirk De Hertog. 2015. “Automatic Term Extraction.” In Handbook of Terminology, vol. 11, edited by Hendrik J. Kockaert and Frieda Steurs, 203–221. Amsterdam: John Benjamins Publishing Company.

Irrera, Ornella, Stefano Marchesin, and Gianmaria Silvello. 2024. “MetaTron: Advancing Biomedical Annotation Empowering Relation Annotation and Collaboration.” BMC Bioinformatics 25 (1): 112.

Jiao, Wenxiang, Wenxuan Wang, Jen-tse Huang, Xing Wang, Shuming Shi, and Zhaopeng Tu. 2023. “Is ChatGPT a Good Translator? Yes with GPT-4 as the Engine.” arXiv. [URL]

Kilgarriff, Adam, Vít Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, and Vít Suchomel. 2014. “The Sketch Engine: Ten Years On.” Lexicography 11: 7–36.

Kilgarriff, Adam. 2009. “Simple Maths for Keywords.” In Proceedings of Corpus Linguistics Conference CL2009 (University of Liverpool, UK, July 2009), edited by M. Mahlberg, V. González-Díaz, and C. Smith. [URL]

Macken, Lieve, Els Lefever, and Véronique Hoste. 2013. “TExSIS: Bilingual Terminology Extraction from Parallel Corpora Using Chunk-Based Alignment.” Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 19 (1): 1–30.

Montero-Martínez, Silvia. 2025. Terminología en instituciones académicas: Modelos de gestión en un contexto internacional. Granada: Comares. ISBN: 978-84-1369-765-9.

. 2023a. “UGRTerm®: Gestión Terminológica para el Ámbito Académico.” In Traducción e Interpretación Especializadas en Ámbito Panhispánico, vol. 601, edited by Pilar Sorbet and Verónica D. Valle Cacela, 177–190. Berlin: Peter Lang GmbH, Internationaler Verlag der Wissenschaften. [URL]

. 2023b. “Training Corporate and Institutional Terminologists: A Case Study at the University of Granada.” The Interpreter and Translator Trainer 17 (3): 412–433.

Montero-Martínez, Silvia, Pamela Faber-Benítez, and Miriam Buendía-Castro. 2011. Terminología para Traductores e Intérpretes: Una Perspectiva Integradora. 2nd ed. Granada: Tragacanto.

Ortiz-Garduño, Helena, and Daniel Torres-Salinas. 2025. “GPTBot Development for Translation Purposes: Flowchart, Practical Case and Future Prospects.” Journal of Language and Education 11 (2): 94–110.

Ortiz-Garduño, Helena, and Victoria Di Césare. 2025. “Generative artificial intelligence applied to terminological variation: The concept of LOCAL RESEARCH as a case study for the development of a ChatGPT bot.” In Proceedings of the III International Conference on Digital Linguistics (CILIDI), 1051, Valencia, Spain.

OpenAI. 2024. “Hello GPT-4-o.” OpenAI. May 13, 2024. [URL]

. 2023. “GPT-4 Technical Report.” arXiv. [URL]

Palomar-González, Virginia. 2004. “La Importancia de la Normalización Terminológica.” In Actas del II Congreso El Español, Lengua de Traducción: Las Palabras del Traductor, edited by Luis González and Pollux Hernúñez, 67–76. Madrid: Esletra.

Pavel, Silvia. 2001. Handbook of Terminology. Ottawa: Translation Bureau of Canada. [URL]

Pazienza, Maria Teresa, Marco Pennacchiotti, and Fabio Massimo Zanzotto. 2005. “Terminology Extraction: An Analysis of Linguistic and Statistical Approaches.” In Studies in Fuzziness and Soft Computing, vol. 1851, 255–279. Berlin: Springer-Verlag. [URL]

Repar, Andraž, Vid Podpečan, Anže Vavpetič, Nada Lavrač, and Senja Pollak. 2022. “TermEnsembler.” Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 28 (1): 93–120.

Rigouts-Terryn, Ayla, Véronique Hoste, and Els Lefever. 2020. “In No Uncertain Terms: A Dataset for Monolingual and Multilingual Automatic Term Extraction from Comparable Corpora.” Language Resources and Evaluation 54 (2): 385–418.

Rigouts-Terryn, Ayla, Véronique Hoste, Patrick Drouin, and Els Lefever. 2020. “TermEval 2020: Shared Task on Automatic Term Extraction Using the Annotated Corpora for Term Extraction Research (ACTER) Dataset.” In Proceedings of the 6th International Workshop on Computational Terminology (COMPUTERM 2020), 85–94. Marseille: European Language Resources Association (ELRA).

Sager, Juan C. 1990. A Practical Course in Terminology Processing. Amsterdam: John Benjamins Publishing Company.

Sahari, Yousef, Abdu M. T. Al-Kadi, and Jamak K. M. Ali. 2023. “A Cross-Sectional Study of ChatGPT in Translation: Magnitude of Use, Attitudes, and Uncertainties.” Journal of Psycholinguistic Research 52 (6): 2937–2954.

San Martín, Antonio. 2024. “What Generative Artificial Intelligence Means for Terminological Definitions.” Paper presented at the 3rd International Conference on Multilingual Digital Terminology Today: Design, Representation Formats and Management Systems, 27–28 June 2024, Granada, Spain.

Schmitz, Klaus-Dirk, and Daniela Straub. 2018. Successful Terminology Management in Companies. Stuttgart: tcworld GmbH.

Schweizerische Bundeskanzlei. 2018. Terminologiearbeit. DES — Konferenz der Übersetzungsdienste Europäischer Staaten. [URL]

Sinclair, John. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press.

TerminOrgs. 2016. Terminology Starter Guide. Edited by Kara Karburton. [URL]

UNESCO. 2019. Preliminary Study on the Ethics of Artificial Intelligence. COMEST — World Commission on the Ethics of Scientific Knowledge and Technology. [URL]

Varantola, Krista. 2002. “Disposable Corpora as Intelligent Tools in Translation.” Cuadernos de Traducción 91: 171–189. Available at: [URL]

Vezzani, Federica. 2022. Terminologie Numérique: Conception, Représentation et Gestion. Vol. 2901. Berlin: Peter Lang Verlag.

Warburton, Kara. 2021. The Corporate Terminologist. Vol. 211. Amsterdam: John Benjamins. [URL].

Zhao, Biao, Weiqiang Jin, Javier Del Ser, and Guang Yang. 2023. “ChatAgri: Exploring Potentials of ChatGPT on Cross-Linguistic Agricultural Text Classification.” Neurocomputing 5571: 126708.