Article published In: Terminology
Vol. 28:1 (2022) ► pp.1–36
Utilising heterogeneous language resources for term extraction in maritime domains
Published online: 10 September 2021
https://doi.org/10.1075/term.20024.and
https://doi.org/10.1075/term.20024.and
Abstract
The development of terminologies for domains where these are lacking is a time-consuming and costly task. This
article takes a methodological perspective and addresses a general methodological question: how can we, with limited funding,
utilise to a maximal degree, existing language resources to create a terminology at a relatively low cost? Although an important
player in the maritime industries for many centuries, Norway has not prioritised the systematic development of an official
maritime terminology. The article therefore focuses specifically on efforts to develop a national resource for maritime domains.
The article describes efforts to create a corpus of popular science and a parallel corpus of technical texts. Six different term
extraction methods are applied. These include corpus-based statistical analyses of frequency, collocation and keyness, as well as
bilingual term extraction. Finally, the pros and cons of each method are evaluated by means of a cost-benefit analysis.
Article outline
- 1.Introduction
- 2.Historical and theoretical background
- 3.Methods and criteria for term extraction in maritime domains
- 3.1Maritime domains
- 3.2Overview of term extraction methods
- 3.3Criteria for unithood and termhood
- 4.Methodological specifics and results from the various term extraction methods
- 4.1Method 1: Frequency analysis of domain-specific corpus
- 4.2Method 2: Keyness analysis of domain-specific vs. general corpus
- 4.3Method 3: Collocation analysis of domain-specific corpus
- 4.4Method 4: Chunking of aligned sentences from a parallel domain-specific corpus
- 4.5Method 5: Retrieval of terms from domain-specific lexical resources
- 4.6Method 6: Retrieval of domain-specific entries in bilingual general dictionary
- 5.Results
- 6.Concluding remarks
- Acknowledgements
- Notes
References
References (44)
Ahmad, Khurshid, and Margaret A. Rogers. 2001. “Corpus
linguistics and terminology extraction.” In Handbook of Terminology
Management (Volume 21), ed.
by Sue-Ellen Wright and Gerhard Budin, 725–760. Amsterdam: John Benjamins.
Ahmad, Khurshid, Andrea E. Davies, Heather Fulford, and Margaret A. Rogers. 1994. “What
is a term? The semi-automatic extraction of terms from
text.” In Translation Studies – An
Interdiscipline, ed. by Mary Snell-Hornby, Franz Pöchhacker and Klaus Kaindl, 267–278.
Austlid, Einar. 1971. Norsk-engelsk ordliste for fiskarar [Norwegian-English dictionary
for fishermen]. Oslo: Reenskaugs forlag.
Andersen, Gisle. 2008. “Quantifying
domain-specificity: the occurrence of financial terms in a general
corpus.” SYNAPS 211: 37–52.
(ed.). 2012. Exploring
Newspaper Language – Using the web to create and investigate a large corpus of modern
Norwegian. Amsterdam: John Benjamins.
. 2016. “Using
the corpus-driven method to chart discourse-pragmatic
change.” In Discourse-pragmatic variation and change in English: New
methods and insights, ed. by Heike Pichler, 21–40. Cambridge: Cambridge University Press.
Andersen, Gisle, Peder Gammeltoft, and Kjetil Gundersen. In
preparation. Termportalen – frå forprosjekt til fast
finansiering [The terminology Portal – from pilot project to permanent
funding]. To be published in Nordterm.
Andersen, Gisle, and Knut Hofland. 2012. “Building
a large corpus based on newspapers from the web.” In Exploring
Newspaper Language, ed. by Gisle Andersen, 1–28. Amsterdam: John Benjamins.
Andersen, Gisle, and Marita Kristiansen. 2013. “Towards
a national portal for Norwegian terminology in the CLARINO
project.” Terminologen 21:188–189.
Lyse, Gunn Inger, and Gisle Andersen. 2012. “Collocations
and statistical analysis of n-grams: Multiword expressions in newspaper
text.” In Exploring Newspaper Language, ed.
by Gisle Andersen, 79–109, Amsterdam: John Benjamins.
Bondi, Marina. 2010. “Perspectives
on keywords and keyness: An introduction.” In Keyness in
Texts, ed. by Marina Bondi, and Mike Scott. Amsterdam, John Benjamins, 1–18.
Bourigault, Didier. 1992. “Surface
grammatical analysis for the extraction of terminological noun
phrases.” In COLING ’92: Proceedings of the Fourteenth International
conference on Computational
Linguistics, 977–981. Nantes: ICC.
. 1994. LEXTER,
un Logiciel d’Extraction de Terminologie: Application à l’acquisition de connaissances à partir de
textes. PhD Thesis, École des Hautes Études en Sciences Sociales, Paris.
Brekke, Magnar, Kai Innselset, Marita Kristiansen, and Kari Øvsthus. 2006. “KB-N:
Automatic term extraction from a knowledge-bank of
economics.” In Proceedings from LRECC
2006, 1912–1915, [URL]
Cabré, M. Teresa. 2003. “Theories of terminology:
Their description, prescription and
explanation.” Terminology 9(2): 163–199.
Cabré, M. Teresa, María Estopa, Rosa Bagot, and Jordi Palatresi. 2001. “Automatic
term detection: A review of current systems.” In Recent advances in
computational terminology, ed. by Didier Bourigault, Christian Jacquemin, and Marie-Claude L’Homme, 53–88. Amsterdam: John Benjamins.
Cabré, M. Teresa. 1999. Terminology: Theory, methods and
applications. Amsterdam: John Benjamins.
Drouin, Patrick, Jean-Benoît Morel, and Marie-Claude L’Homme. 2020. “Automatic
Term Extraction from Newspaper Corpora: Making the Most of Specificity and Common
Features.” Proceedings of the 6th International Workshop on Computational Terminology
(COMPUTERM 2020), 1–7.
Foo, Jody, and Magnus Merkel. 2010. “Computer
aided term bank creation and standardization: Building standardized term banks through automated term extraction and advanced
editing tools”. In Terminology in Everyday
Life, ed. by Marcel Thelen and Frieda Steurs, 163–180. Amsterdam: John Benjamins.
Fulford, Heather. 2001. “Exploring
terms and their linguistic environment: A domain-independent approach to automated term
extraction.” Terminology 7(2): 259–279.
Heid, Ulrich. 2006. “Extracting
term candidates from recursively chunked text.” In Terminology,
computing and translation, ed. by Pius ten Hacken, 97–115. Tübingen: Gunter Narr.
Hiemstra, Djoerd. 1998. “Multilingual
Domain Modeling in Twenty-One. Automatic Creation of a Bi-directional Translation Lexicon from a Parallel
Corpus.” In Proceedings of the 8th CLIN
meeting, ed. by P. H. Coppen, L. van Halsteren, and L. Teunissen, 41–58. Amsterdam: Rodopi.
Hofland, Knut, and Øystein Reigem. 2006. Translation
Corpus Aligner, version 2. An interactive sentence aligner. Paper presented
at ICAME. [URL]
Hofland, Knut, and Stig Johansson. 1998. “The
Translation Corpus Aligner: A program for automatic alignment of parallel
texts.” In Corpora and Cross-linguistic Research: Theory, Method, and
Case Studies, ed. by In Stig Johansson, and Signe Oksefjell, 87–100. Amsterdam: Rodopi.
Kageura, Kyo, and Elizabeth Marshman. 2019. “Terminology
Extraction and Management.” In The Routledge Handbook of Translation
and Technology, ed. by Minako O’Hagan, 61–77. London: Routledge.
Kageura, Kyo, and Bin Umino. 1996. “Methods
of automatic term
recognition.” Terminology, 3(2), 259–289.
Kolstad, Ellinor. 2006. “Skjær i sjøen under oversettelse av romanen Trawler” [Stumbling blocks in the translation of the novel
Trawler]. Språknytt 2006 (2): 19–23.
Kristiansen, Marita, and Magnar Brekke. 2004. “Kunnskapsbank
for norsk økonomisk- administrative fagdomene.” Språk og
språkundervisning 11.
Macken, Lieve, Els Lefever, and Veronique Hoste. 2013. “TExSIS:
Bilingual terminology extraction from parallel corpora using chunk-based
alignment.” Terminology, 19(1), 1–30.
Myking, Johan. 2005. “Terminologi i Noreg – historisk oversyn” [Terminology in
Norway – an historical overview]. In Hvem tar ansvaret for
fagterminologien?, ed. by Jan Hoel, 2–15. Oslo: Språkrådet.
Nazarenko, Adeline, and Haifa Zargayouna. 2009. “Evaluating
term extraction.” International Conference Recent Advances in Natural Language Processing
(RANLP’09). Borovets, Bulgaria. 299–304. [URL]
Pettersen, Jan Martin. 1997. Go fishing! Engelsk
for fiskere, havbrukere og fisketilvirkere. [Go fishing! English for fishermen,
sea farmers and fish product
manufacturers.] Oslo: Landbruksforlaget.
Rayson, Paul, and Roger Garside. 2000. “Comparing
corpora using frequency profiling.” In Proceedings of the workshop on
Comparing Corpora, held in conjunction with the 38th annual meeting of the Association for Computational
Linguistics (ACL 2000), 1–6.
Rayson, Paul, Geoffrey Leech, and Mary Hodges. 1997. “Social
differentiation in the use of English vocabulary: some analyses of the conversational component of the British National
Corpus.” International Journal of Corpus
Linguistics 2 (1):133–52.
Rigouts Terryn, Ayla, Patrick Drouin, Veronique Hoste, and Els Lefever. 2020. “TermEval
2020: Shared Task on Automatic Term Extraction Using the Annotated Corpora for Term Extraction Research (ACTER)
Dataset.” Proceedings of the LREC 2020 6th International Workshop on Computational Terminology
(COMPUTERM 2020), 85–94.
Rigouts Terryn, Ayla, Veronique Hoste, and Els Lefever. 2019. “In
No Uncertain Terms: A Dataset for Monolingual and Multilingual Automatic Term Extraction from Comparable
Corpora.” Language Resources and
Evaluation, 54(2), 385–418.
Sinclair, John, Susan Jones, Robert Daley, and Ramesh Krishnamurthy. 2004. English
collocational studies: The OSTI
report. London: Continuum.
Solberg, Marte. 1995. A
dictionary and terminological analysis of merchant ship terms. Unpublished Master
thesis, NHH.
Vintar, Špela. 2010. “Bilingual
Term Recognition
Revisited.” Terminology, 16(2), 141–158.
Cited by (3)
Cited by three other publications
Wang, Xiaowen, Chu-Ren Huang & Chongxuan Guo
Wissik, Tanja
2025. Impact of automatic term extraction on terminology work. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 31:1 ► pp. 110 ff.
Dong, Jihua, Shuai Dong & Louisa Buckingham
2023. A discourse dynamics exploration of terminology for Covid-19 in professional and public discourse. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 29:2 ► pp. 224 ff.
This list is based on CrossRef data as of 6 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
