In:Exploring Newspaper Language: Using the web to create and investigate a large corpus of modern Norwegian
Edited by Gisle Andersen
[Studies in Corpus Linguistics 49] 2012
► pp. 79–110
Collocations and statistical analysis of n-grams
Multiword expressions in newspaper text
Published online: 23 March 2012
https://doi.org/10.1075/scl.49.05lys
https://doi.org/10.1075/scl.49.05lys
Multiword expressions (MWEs) are words that co-occur so often that they are perceived as a linguistic unit. Since MWEs pervade natural language, their identification is pertinent for a range of tasks within lexicography, terminology and language technology. We apply various statistical association measures (AMs) to word sequences from the Norwegian Newspaper Corpus (NNC) in order to rank two-and three-word sequences (bigrams and trigrams) in terms of their tendency to co-occur. The results show that some statistical measures favour relatively frequent MWEs (e.g. i motsetning til ‘as opposed to’), whereas other measures favour relatively low-frequent units, which typically comprise loan words (de facto), technical terms (notaries publicus) and phrasal anglicisms (practical jokes; cf. G. Andersen this volume). On this basis we evaluate the relevance of each of these measures for lexicography, terminology and language technology purposes.
Cited by (6)
Cited by six other publications
Szczerbowicz, Wojciech
Akundi, Aditya & Oscar Mondragon
Andersen, Gisle
2022. Utilising heterogeneous language resources for term extraction in maritime domains. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 28:1 ► pp. 1 ff.
Maisto, Alessandro
Dione, Cheikh Bamba & Christer Johansson
This list is based on CrossRef data as of 1 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
