Building a large corpus based on newspapers from the web

Andersen, Gisle; Hofland, Knut

doi:10.1075/scl.49.01and

In:Exploring Newspaper Language: Using the web to create and investigate a large corpus of modern Norwegian
Edited by Gisle Andersen
[Studies in Corpus Linguistics 49] 2012
► pp. 1–28

Get fulltext from our e-platform

Download Book PDF

Building a large corpus based on newspapers from the web

Gisle Andersen | NHH Norwegian School of Economics

Knut Hofland | Uni Computing

Published online: 23 March 2012

https://doi.org/10.1075/scl.49.01and

The Norwegian Newspaper Corpus (NNC) is an initiative to create a large monitor corpus representing contemporary Norwegian language in both its written varieties, Bokmål and Nynorsk. The corpus is compiled through daily harvesting and processing of published texts from the web edition of Norwegian newspapers. This introductory chapter gives a survey of work on corpus building, tool development and research in connection with the NNC project. It provides an overview of the corpus and its system architecture, describing the work flow, tools and methods used in the data processing. The chapter also gives a presentation of the individual research contributions to this volume.

Cited by (4)

Cited by four other publications

Order by:

de Smedt, Koenraad

2025. Searching for the progressive in treebanks. In The Progressive Revisited [Studies in Language Companion Series, 236], ► pp. 190 ff.

Andersen, Gisle

2022. Utilising heterogeneous language resources for term extraction in maritime domains. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 28:1 ► pp. 1 ff.

Jahnsen, Synnøve Økland, Anje Müller Gjesdal & Gisle Andersen

2022. Om fremveksten av en ny forståelsesramme for arbeidslivet og kampen mot arbeidslivskriminalitet. Norsk sosiologisk tidsskrift 6:5 ► pp. 8 ff.

Kristiansen, Marita

2014. 11. Concept change, term dynamics and culture-boundness in economic-administrative domains. In Dynamics and Terminology [Terminology and Lexicography Research and Practice, 16], ► pp. 235 ff.

This list is based on CrossRef data as of 1 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.