In:Exploring Newspaper Language: Using the web to create and investigate a large corpus of modern Norwegian
Edited by Gisle Andersen
[Studies in Corpus Linguistics 49] 2012
► pp. 31–50
Corpuscle – a new corpus management platform for annotated corpora
Published online: 23 March 2012
https://doi.org/10.1075/scl.49.02meu
https://doi.org/10.1075/scl.49.02meu
Corpuscle is a new corpus query engine and Web-based corpus management system. The main design goals were the ability to handle very large corpora, support for structured data (XML), and seamless integration of manual corpus annotation and editing. New algorithms have been developed, among them a technique for running finite state automata from edges with lowest corpus counts, and an implementation of regular expressions on suffix arrays for fast reverse index lookup. These algorithms allow for a clean and elegant implementation of multi-valued and set-valued attributes. The web interface offers rich functionality for concordancing, collocations, distribution statistics, and more. Queries can be input in a graphical, menu-driven way, freeing the user from dealing with the complexities of the query language.
Cited by (6)
Cited by six other publications
Seitanidi, Eleni, Nele Põldvere & Carita Paradis
Batinić, Josip, Elena Frick & Thomas Schmidt
Põldvere, Nele, Johan Frid, Victoria Johansson & Carita Paradis
PÕLDVERE, NELE, VICTORIA JOHANSSON & CARITA PARADIS
Fløttum, Kjersti, Øyvind Gjerstad & Anje Müller Gjesdal
This list is based on CrossRef data as of 1 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
