Article published In: International Journal of Corpus Linguistics
Vol. 28:2 (2023) ► pp.144–171
Annotating dialogue acts in speech data
Problematic issues and basic dialogue act categories
Published online: 8 August 2022
https://doi.org/10.1075/ijcl.20165.ver
https://doi.org/10.1075/ijcl.20165.ver
Abstract
The aims of this paper are to detect the most problematic issues related to dialogue act annotation in speech
corpora and to define basic categories of dialogue acts. I critically examine and test generic schemes that represent different
lines of dialogue act annotation: AMI, DART, ISO 24617–2 and SWBD-DAMSL. It is found that the most problematic issues regarding
dialogue act annotation are related to the distinction between the semantic and pragmatic meanings of utterances, the annotation
of metadiscourse, and the adequacy and informativeness of the tagset. The identified basic dialogue act categories are information
providing, information seeking, actions, social acts and metadiscourse. The findings help improve dialogue act annotation.
Article outline
- 1.Introduction
- 2.Dialogue act annotation schemes
- 3.Methodology
- 3.1Selecting dialogue act annotation schemes
- 3.2Test data
- 3.3Annotation process
- 3.4Analytical procedure
- 4.Dialogue act annotation
- 4.1Applicability to a new language
- 4.2Utterance meaning
- 4.3Ambiguity
- 4.3.1Basic unit
- 4.3.2Tags
- 4.4Adequacy
- 4.5Informativeness
- 5.Dialogue act categories
- 5.1Information-providing acts
- 5.2Information-seeking acts
- 5.3Action acts
- 5.4Social acts
- 5.5Metadiscourse acts
- 6.Conclusions
References
References (46)
Alexandersson, J., Buschbeck-Wolf, B., Fujinami, T., Maier, E., Reithinger, N., Schmitz, B., & Siegel, M. (1997). Dialogue
Acts in VERBMOBIL-2. Report 204. DFKI GmbH, Saarbrücken, Germany. [URL]
Allen, J. F., Schubet, L. K., Ferguson, G., Heeman, P., Hwang, C. H., Kato, T., Light, M., Martin, N. G., Miller, B. W., Poesio, M., & Traum, D. R. (1994). The
TRAINS project: A Case Study in Building Conversational Planning Agent. TRAINS technical note
94–3. The University of Rochester. [URL]
Allen, J., & Core, M. (1997). Draft
of DAMSL: Dialog Act Markup in Several Layers. [URL]
AMI. (2005). Guidelines for Dialogue Act
and Addressee Annotation Version 1.0. [URL]
Barras, C., Geoffrois, E., Wu, Z., & Liberman, M. (2000). Transcriber:
Development and use of a tool for assisting speech corpora production. Speech
Communication, 33(1–2), 5–22.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman
Grammar of Spoken and Written English. Longman.
(1995). Dynamic
interpretation and dialogue theory. In M. M. Taylor, F. Neel, & D. G. Bouwhuis. (Eds.), The
Structure of Multimodal
Dialogue (pp. 139–188). John Benjamins.
(2009). The
DIT++ taxonomy for functional dialogue markup. In D. Heylen, C. Pelachaud, R. Catizone, & D. Traum. AMAAS
2009 Workshop ‘Towards a Standard Markup Language for Embodied Dialogue Acts’
Proceedings (pp. 13–23). Budapest. [URL]
(2019). Guidelines
for Using ISO Standard 24617-2. [URL]
Bunt, H. C., & Black, B. (2000). The
ABC of computational pragmatics. In H. C. Bunt & W. Black. (Eds.), Computational
Pragmatics: Abduction, Belief and Context. John Benjamins.
Clark, A., & Popescu-Belis, A. (2004). Multi-level
Dialogue Act Tags. In Proceedings of the 5th SIGdial Workshop on
Discourse and Dialogue at
HLT-NAACL 2004 (pp. 163–170). Association
for Computational Linguistics. [URL]
De Felice, R., Darby, J., Fisher, A., & Peplow, D. (2013). A
classification scheme for annotating speech acts in a business email corpus. ICAME
Journal, 371, 71–105. [URL]
Dhillon, R., Bhagat, S., Carvey, H., & Shriberg, E. (2004). Meeting
Recorder Project: Dialog Act Labeling Guide. ICSI Technical Report TR-04-002. [URL].
Di Eugenio, B., Jordan, P. W., & Pylkkänen, L. (1998). The
COCONUT Project: Dialogue Annotation Manual. ISP Technical Report
98-1, University of Pittsburgh.
Godfrey, J., & Holliman, E. (1997). Switchboard-1
Release 2. Linguistic Data Consortium. [URL]
Irie, Y., Matsubara, S., Kawaguchi, N., Yamaguchi, Y., & Inagaki, Y. (2006). Layered
speech-act annotation for spoken dialogue
corpus. In LREC 2006 (pp. 1584–1589). [URL]
ISO 24617-2. (2012). ISO DIS 24617-2
Language resource management – Semantic annotation framework (SemAF), Part 2: Dialogue
acts. Geneva.
Jurafsky, D. (2004). Pragmatics
and computational linguistics. In L. R. Horn & G. Ward. (Eds.), The
Handbook of
Pragmatics (pp. 578–604). Blackwell.
Jurafsky, D., Shriberg, E., & Biasca, D. (1997). Switchboard
SWBD-DAMSL shallow-discourse-function annotation. Coders manual, draft 13. University of Colorado at
Boulder & +SRI International. [URL]
Kang, S., Kim, H., & Seo, J. (2010). A
reliable multidomain model for speech act classification. Pattern Recognition
Letters, 311, 71–74.
Kirk, J. M. (2013). Beyond
the structural levels of language: An introduction to the SPICE-Ireland corpus and its
uses. In J. Cruickshank & R. McColl Millar. (Eds.), After
the Storm: Papers from the Forum for Research on the Languages of Scotland and Ulster Triennial
Meeting (pp. 207–232). Forum
for Research on the Languages of Scotland and Ireland. [URL]
Klein, M. (1999). An
overview of the state of the art of coding schemes for dialogue act annotation. Lecture Notes
in Computer
Science, 1(1692), 274–279.
Klein, M., Bernsen, N. O., Davies, S., Dybkjær, Garrido, J., Kasch, H., Mengel, A., Pirrelli, V., Poesio, M., Quazza, S., & Soria, C. (1998). MATE
Deliverable D1.1: Supported Coding Schemes. 4. Dialogue Acts. [URL]
Leech, G. N. (1980). Explorations
in Semantics and Pragmatics. John Benjamins.
Leech, G., & Weisser, M. (2003). Generic
speech act annotation for task-oriented dialogues. In D. Archer, P. Rayson, A. Wilson, & T. McEnery. (Eds.), Proceedings
of the Corpus Linguistics 2003 Conference. Lancaster University, UCREL Technical Papers, vol. 161. [URL]
Leech, G., Weisser, M., Wilson, A., & Grice, M. (2000). Survey
and guidelines for the representation and annotation of
dialogue. In D. Gibbon, I. Mertins, & R. Moore. (Eds), Handbook
of Multimodal and Spoken Language
Systems (pp. 10–11). Kluwer.
McAllister, P. G. (2015). Speech
acts: A synchronic perspective. In K. Aijmer & C. Rühlemann. Corpus
Pragmatics: A
Handbook (pp. 29–51). Cambridge University Press.
Meteer, M. (1995). Dysfluency
Annotation Stylebook for the Switchboard Corpus. University of Pennsylvania.
Morris, C. W. (1938). Foundations
of the theory of signs. In O. Neurath, R. Carnap, & C. Morris. (Eds.), International
Encyclopedia of Unified
Science (pp. 77–138). University of Chicago Pess.
Park, J., & Kim, Y. (2018). A
novel speech-act coding scheme to visualize the intention of crew communications to cope with simulated off-normal conditions
of nuclear power plants. Reliability Engineering and System
Safety, 1781, 236–246.
Qadir, A., & Riloff, E. (2011). Classifying
sentences as speech acts in message board posts. In Proceedings of
the 2011 Conference on Empirical Methods in Natural Language
Processing (pp. 748–758). Association
for Computational Linguistics. [URL]
Searle, J. R. (1979). Expression
and Meaning: Studies in the Theory of Speech Acts. Cambridge University Press.
Vail, A. K., & Boyer, K. E. (2014). Identifying
effective moves in tutoring: On the refinement of dialogue act annotation
schemes. In S. Trausan-Matu, K. Elizabeth Boyer, M. Crosby, & Kitty Panourgia. (Eds.), ITS
2014,
LNCS 84741 (pp. 199–209). Springer.
Verdonik, D., Kosem, I., Zwitter Vitez, A., Krek, S., & Stabej, M. (2013). Compilation,
transcription and usage of a reference speech corpus: The case of the Slovene corpus
GOS. Language Resources and Evaluation
Journal, 47(4), 1031–1048.
Weisser, M. (2014). Speech
act annotation. In K. Aijmer & C. Rühlemann. (Eds.), Corpus
Pragmatics: A
Handbook (pp. 84–113). Cambridge University Press.
(2016). DART –
The dialogue annotation and research tool. Corpus Linguistics and Linguistic
Theory, 12(2), 355–388.
(2018). How
to Do Corpus Pragmatics on Pragmatically Annotated data: Speech acts and Beyond. John Benjamins.
(2019a). The
DART Taxonomy v. 3. [URL]
(2019b). The
DART annotation scheme: Form, applicability & application. Studia
Neophilologica, 91 (2), 131–153.
(2020). Speech
acts in corpus pragmatics: Making the case for an extended taxonomy. International Journal of
Corpus
Linguistics, 25(4), 400–425.
Zhao, T., & Kawahara, T. (2019). Joint
dialog act segmentation and recognition in human conversations using attention to dialog
context. Computer Speech &
Language, 571, 108–127. [URL].
Cited by (4)
Cited by four other publications
Dippold, Doris, Freda Mold & Priyanki Ghosh
Schweinberger, Martin & Michael Haugh
2025. Reproducibility and transparency in interpretive corpus pragmatics. International Journal of Corpus Linguistics 30:2 ► pp. 234 ff.
[no author supplied]
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
