A data-driven approach to anglicism identification in Norwegian

Losnegaard, Gyri Smørdal; Lyse, Gunn Inger

doi:10.1075/scl.49.07los

In:Exploring Newspaper Language: Using the web to create and investigate a large corpus of modern Norwegian
Edited by Gisle Andersen
[Studies in Corpus Linguistics 49] 2012
► pp. 131–154

Get fulltext from our e-platform

Download Book PDF

A data-driven approach to anglicism identification in Norwegian

Gyri Smørdal Losnegaard | University of Bergen

Gunn Inger Lyse | University of Bergen

Published online: 23 March 2012

https://doi.org/10.1075/scl.49.07los

Anglicisms are words of English origin that have entered into Norwegian, either denoting conceptual innovations such as interface or denoting existing concepts in parallel with their Norwegian counterparts (boots). In this chapter we investigate whether machine-learning methods could improve the anglicism component of the classification tool that is currently used to categorize new words appearing in the Norwegian Newspaper Corpus. We derive classification features by extracting three-character sequences (trigrams) from long lists of uniquely English and Norwegian words. Next, we test two frequency-based and a statisticsbased approach to selecting features from this initial pool of trigrams. Finally, using the TiMBL memory-based learning system, we train a classifier with our selections of trigrams, identifying the sets of trigrams that are most predictive for identifying anglicisms. The results show that the datadriven frequency approach, although not sufficient as a stand-alone method for automatic anglicism identification, provides a valuable supplement to the existing knowledge-based classification tool.

Cited by (2)

Cited by two other publications

Miloshevska, Lina

2025. EXTRACTION OF ANGLICISMS FROM A CORPUS OF MACEDONIAN MAGAZINE TEXTS. English Studies at NBU 11:1 ► pp. 141 ff.

Cierpich-Kozieł, Agnieszka, Elżbieta Mańczak-Wohlfeld & Alicja Witalisz

2023. English-Sourced Direct and Indirect Borrowings in a New Lexicon of Polish Anglicisms. Studies in Polish Linguistics 18:1 ► pp. 1 ff.

This list is based on CrossRef data as of 1 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.