Technological and methodological challenges in creating, annotating and sharing a learner corpus of spoken German

Hedeland, Hanna; Schmidt, Thomas

doi:10.1075/hsm.14.04hed

In:Multilingual Corpora and Multilingual Corpus Analysis
Edited by Thomas Schmidt and Kai Wörner
[Hamburg Studies on Multilingualism 14] 2012
► pp. 25–46

Get fulltext from our e-platform

Download Book PDF

Technological and methodological challenges in creating, annotating and sharing a learner corpus of spoken German

Hanna Hedeland

Thomas Schmidt

Published online: 15 November 2012

https://doi.org/10.1075/hsm.14.04hed

This article discusses questions concerning the creation, annotation and sharing of spoken language corpora. We use the Hamburg Map Task Corpus (HAMATAC), a small corpus in which advanced learners of German were recorded solving a map task, as an example to illustrate our main points. We first give an overview of the corpus creation and annotation process including recording, metadata documentation, transcription and semi-automatic annotation of the data. We then discuss the manual annotation of disfluencies as an example case in which many of the typical and challenging problems for data reuse – in particular the reliability of interpretative annotations – are revealed.

Cited by (3)

Cited by three other publications

Frick, Elena & Thomas Schmidt

2025. 339Querying spoken language data. In Harmonizing language data, ► pp. 339 ff.

Hirschmann, Hagen & Thomas Schmidt

2022. Gesprochene Lernerkorpora: Methodisch-technische Aspekte der Erhebung, Erschließung und Nutzung. Zeitschrift für germanistische Linguistik 50:1 ► pp. 36 ff.

Wisniewski, Katrin

2022. Gesprochene Lernerkorpora des Deutschen: Eine Bestandsaufnahme. Zeitschrift für germanistische Linguistik 50:1 ► pp. 1 ff.

This list is based on CrossRef data as of 11 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.