In:In Search of Basic Units of Spoken Language: A corpus-driven approach
Edited by Shlomo Izre'el, Heliana Mello, Alessandro Panunzi and Tommaso Raso
[Studies in Corpus Linguistics 94] 2020
► pp. 403–432
Chapter 8Comparing annotations for the prosodic segmentation of spontaneous speech
Focus on reference units
Published online: 18 June 2020
https://doi.org/10.1075/scl.94.17pan
https://doi.org/10.1075/scl.94.17pan
Abstract
This chapter reports a quantitative and qualitative comparison of
seven annotations performed on the same two American English texts: a monologue and a
dialogue. The analysis of these data is complex, since the annotations have been made
independently by each research group on the basis of their own theoretical frameworks.
Despite this difference, the fundamental role of prosody in the analysis of speech emerges
clearly in every annotation. Prosodic breaks can be then viewed as theory independent
entities. After summarizing the key features of theoretical models, we derived a unified
tagset and developed a web application (SLAC) to compare different annotations. Finally,
agreement on prosodic breaks has been measured in different ways, reporting promising
results in terminal break identification.
Article outline
- 1.Introduction
- 2.Comparing the different theoretical perspectives
- 2.1Preliminary remarks
- 2.2The segmentation of the speech flow into discrete units
- 2.3The relation between prosody and syntax
- 2.4The nature of the reference units for spoken speech
- 3.The SLAC database
- 3.1Web interface
- 3.2The Unified Tagset
- 4.Overall agreement
- 4.1Starting data and preliminary choices
- 4.2Interpreting the data
- 4.3Standard agreement
- 5.Pairwise agreement
- 5.1ANY: agreement on prosodic break perception
- 5.2OTB: agreement on terminal break
- 5.3Strong and weak disagreement
- 5.4Weighted agreement
- 6.Final remarks
Acknowledgements Notes References Appendix
References (21)
Barbosa, P. A., & Raso, T. (2018). Spontaneous speech segmentation: Functional and prosodic aspects with
applications for automatic segmentation [A segmentação da fala espontânea: Aspectos prosódicos, funcionais e aplicações para a
tecnologia]. Revista de Estudos da
Linguagem, 26(4), 1361–1396.
Carletta, J. (1996). Assessing
agreement on classification tasks: The kappa
statistic. Computational
Linguistics, 22, 249–254. Retrieved
from <[URL]>
Chafe, W. (1994). Discourse,
counsciousness, and time: The flow and displacement of conscious experience in speaking
and writing. Chicago, IL: The University of Chicago Press.
Cresti, E. (2000). Corpus
di italiano parlato:
Introduzione (Vol. 1). Firenze: Accademia della Crusca.
Davies, M., & Fleiss, J. L. (1982). Measuring
agreement for multinomial
data. Biometrics, 38(4), 1047–1051.
Debaisieux, J.-M. (Ed.). (2013). Analyses
linguistiques sur corpus: Subordination et insubordination en français
contemporain. Paris: Hermès.
Debaisieux, J-M., & Martin, P. (this
volume). Syntactic and prosodic segmentation in spoken
French. In S. Izre’el, H. Mello, A. Panunzi, & T. Raso (Eds.), In
search of basic units of spoken language: A corpus-driven
approach. Amsterdam: John Benjamins.
Den, Y., Koiso, H., Maruyama, T., Maekawa, K., Takanashi, K., Enomoto, M., & Yoshida, N. (2010). Two-level
annotation of utterance-units in Japanese dialogs: An empirically emerged
scheme. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner, & D. Tapias (Eds.), Proceedings
of the 7th language resources and evaluation conference
(LREC2010) (pp. 2103–2110). Valetta, Malta: European Language Resources Association (ELRA).
Deulofeu, H.-J. (2003). L’approche
macrosyntaxique en syntaxe : Un nouveau modèle de rasoir d’Occam contre les notions
inutiles. Scolia, 16, 112–125.
Du Bois, J., Chafe, W., Meyer, C., Thompson, S., Englebretson, R., & Martey, N. (2000–2005). Santa
Barbara corpus of spoken American English. Philadelphia, PA: Linguistic Data Consortium.
Fleiss, J. L. (1971). Measuring
nominal scale agreement among many raters. Psychological
Bulletin, 76(5), 378–382.
Kibrik, A. A., Korotaev, N. A., & Podlesskaya, V. I. (this
volume). Russian spoken discourse: Local structure and
prosody. In S. Izre’el, H. Mello, A. Panunzi, & T. Raso (Eds.), In
search of basic units of spoken language: A corpus-driven
approach. Amsterdam: John Benjamins.
(this
volume). The Moscow approach to local discourse structure:
An application to
English. In S. Izre’el, H. Mello, A. Panunzi, & T. Raso (Eds.), In
search of basic units of spoken language: A corpus-driven
approach. Amsterdam: John Benjamins.
Krippendorff, K. (1980). Content
analysis: An introduction to its methodology. Newbury Park, CA: Sage.
MacWhinney, B. (2000). The
CHILDES project: Tools for analyzing talk. Mahwah, NJ: Lawrence Erlbaum Associates.
Martin, P. (this
volume). Analysis of two examples with the dependency
incremental prosodic structure
model. In S. Izre’el, H. Mello, A. Panunzi, & T. Raso (Eds.), In
search of basic units of spoken language: A corpus-driven
approach. Amsterdam: John Benjamins.
Maruyama, T. (this
volume). Segmentation of English texts
Navy and Hearts with SUU and
LUU. In S. Izre’el, H. Mello, A. Panunzi, & T. Raso (Eds.), In
search of basic units of spoken language: A corpus-driven
approach. Amsterdam: John Benjamins.
Mithun, M. (this
volume). Basic
units. In S. Izre’el, H. Mello, A. Panunzi, & T. Raso (Eds.), In
search of basic units of spoken language: A corpus-driven
approach. Amsterdam: John Benjamins.
Moneglia, M., & Raso, T. (2014). Notes
on Language into Act
Theory. In T. Raso & H. Mello (Eds.), Spoken
corpora and linguistic
studies (pp. 468–495). Amsterdam: John Benjamins.
Cited by (3)
Cited by three other publications
Moneglia, Massimo & Giorgina Cantalini
Cresti, Emanuela & Massimo Moneglia
This list is based on CrossRef data as of 1 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
