In:Intonational Grammar in Ibero-Romance: Approaches across linguistic subfields
Edited by Meghan E. Armstrong, Nicholas Henriksen and Maria del Mar Vanrell
[Issues in Hispanic and Lusophone Linguistics 6] 2016
► pp. 227–248
Towards automatic language processing and intonational labeling in European Portuguese
Fernando Batista | Instituto de Engenharia de Sistemas e Computadores – Investigação e Desenvolvimento em Lisboa
Isabel Trancoso | Instituto de Engenharia de Sistemas e Computadores – Investigação e Desenvolvimento em Lisboa
Published online: 31 March 2016
https://doi.org/10.1075/ihll.6.11mon
https://doi.org/10.1075/ihll.6.11mon
This work describes a framework that encompasses multi-layered linguistic information, focusing on prosodic features (pitch, energy, and tempo patterns), uses such features to distinguish between sentence-form types and disfluency/fluency repairs, and contributes to the characterization of intonational patterns of spontaneous and prepared speech in European Portuguese. Different machine learning methods have been applied for discriminating between structural metadata events, both in university lectures and in map-task dialogues, containing large amounts of spontaneous speech. Results show that prosodic features, and particularly a set of very informative features, are crucial to distinguish between sentence-form types and disfluency/fluency repair events. This is the first work for European Portuguese on both fully automatic processing of multi-layered linguistically description of spoken corpora and intonational labeling.
Keywords: European Portuguese, prosody, speech processing, structural metadata
References (61)
Abad, A., & Neto, J. (2008). Incorporating acoustical modelling of phone transitions in a hybrid ANN/HMM speech recognizer. In
Proceedings of Interspeech 2008
(pp. 2394-2397), Brisbane, Australia.
Aiken, E., Thomas, G., & Shennum, W. (1975). Memory for a lecture: Effects of notes, lecture rate, and informational density. Journal of Educational Psychology, 67(3), 439-444.
Anderson, A., Bader, M., Bard, E., Boyle, E., Doherty, G.M., Garrod, S., Isard, S., Kowtko, J., McAllister, J., Miller, J., Sotillo, C., Thompson, H.S., & Weinert, R. (1991). The HCRC Map Task Corpus. Language and Speech, 34, 351-366.
Batista, F., Moniz, H., Trancoso, I., & Mamede, N. (2012a). Bilingual experiments on automatic recovery of capitalization and punctuation of automatic speech transcripts. IEEE Transactions on Audio, Speech, and Language Processing, 20 (2), 474-485.
Batista, F., Moniz, H., Trancoso, I., Mamede, N., & Mata, A.I. (2012b). Extending automatic transcripts in a unified data representation towards a prosodic-based metadata annotation and evaluation. Journal of Speech Sciences, 2, 115-138.
Beckman, M., & Pierrehumbert, J. (1986). Intonational structure in Japanese and English. Phonology Yearbook III, 15-70.
Beckman, M., Hirschberg, J., & Shattuck-Hufnagel, S. (2005). The original ToBI system andthe evolution of the ToBI framework. In S.-A. Jun (Eds.) Prosodic typology. The phonology of intonation and phrasing (pp. 9-54). Oxford: Oxford University Press.
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. New York, NY: Taylor and Francis.
Burger, S., & Sloane, Z. (2004). The ISL meeting corpus: Categorical features of communicative group Interactions. In
Proceedings of Rich Transcription 2004, Spring Meeting Recognition Workshop
, Montreal, Canada.
Campbell, N. (2010). Expressive speech processing and prosody engineering: An illustrated essay on the fragmented nature of real interactive speech. In F. Chen, & Jokinen, K. (Eds.), Speech technology theory and applications (pp. 105-120). New York, NY: Springer.
Christensen, H., Gotoh, Y., & Renals, S. (2001). Punctuation annotation using statistical prosody models. In
Proceedings of ASRU 2001
(pp. 35-40), Madonna di Campiglio, Italy.
Cruz-Ferreira, M. (1998). Intonation in European Portuguese. In D. Hirst, & A. Di Cristo, (Eds.), Intonation systems (pp. 167-178). Cambridge: Cambridge University Press.
Eklund, R. (2004). Disfluency in Swedish human-human and human-machine travel booking dialogues. Unpublished PhD dissertation. University of Linköpink.
Falé, I. (2005). Percepção e reconhecimento da informação entoacional em Português Europeu. Unpublished PhD dissertation. University of Lisbon.
Falé, I., & Faria, I. (2006). Categorical perception of intonational contrasts in European Portuguese. In R. Hoffan, & H. Mixdorff (Eds.), Proceedings of Speech Prosody (pp. 69-72), Dresden: TUDpress Verlag der Wissenschaften.
Favre, B., Hakkani-Tür, D., & Shriberg, E. (2009). Syntactically-informed models for comma prediction. In Proceedings of ICASSP'09 (pp. 4697-4700), Taipei, Taiwan.
Frota, S. (2000). Prosody and focus in European Portuguese. Phonological phrasing and in- tonation. New York NY: Garland.
. (2002). Nuclear falls and rises in European Portuguese: A phonological analysis of declarative and question intonation. In Probus, 14, 113-146.
(2009). The intonational phonology of European Portuguese. In S.-A. Jun (Ed.) Prosodic typology. The phonology of intonation and phrasing (pp. 6-42). Oxford: Oxford University Press.
. (2012). Prosodic structure, constituents and their representations. In A. Cohn, C. Fougeron, & M. Huffman (Eds.), The Oxford handbook of laboratory phonology (pp. 255-265). Oxford: Oxford University Press.
Garrido, J., Escudero, D., Aguilar, L., Cardeñoso, V., Rodero, E., de la Mota, C., González, C., Vivaracho, C., Rustullet, S., Larrea, O., Laplaza, Y., Vizcaíno, F., Estebas, E., Cabrera, M., & Bonafonte, A. (2013). Glissando: A corpus for multidisciplinary prosodic studies in Spanish and Catalan. Journal of Language Resources and Evaluation, 47, 945-971.
Gussenhoven, C. (2004). The phonology of tone and intonation. Cambridge: Cambridge University Press.
Heeman, P., & Allen, J. (1999). Speech repairs, intonational phrases and discourse markers: Modeling speakers’ utterances in spoken dialogue. Computational Linguistics, 25, 527-571.
Hindle, D. (1983). Deterministic parsing of syntactic non-fluencies. In Proceedings of ACL-83 (pp. 123-128),Cambridge, MA.
Jun, S.-A. (2005). Prosodic typology – the phonology of intonation and phrasing. Oxford: Oxford University Press.
Kim, J., Schwarm, S.E., & Ostendorf, M. (2004). Detecting structural metadata with decision trees and transformation-based learning. In
Proceedings of HLT-NAACL 2004
(pp. 137-144), New York, NY.
Kolár, J., Liu Y., & Shriberg, E. (2009). Genre effects on automatic sentence segmentation of speech: a comparison of broadcast news and broadcast conversations. In
Proceedings of ICA SSP 2009
(470-4704). Taipei, Taiwan.
Kolár, J., & Liu, Y. (2010). Automatic sentence boundary detection in conversational speech: A cross-lingual evaluation on English and Czech. In
Proceedings of ICASSP 2010
(pp. 5258-5261), Dallas, TX.
Liu, Y., Shriberg, E., Stolcke, A., Dustin, H., Ostendorf, M., & Harper, M. (2006). Enriching speech recognition with automatic detection of sentence boundaries and disfluencies. IEEE Transactions on Audio, Speech, and Language Processing, 14(5), 1526-1540.
Makhoul, J., Kubala, F., Schwartz, R., & Weischedel, R. (1999). Performance measures for information extraction. In
Proceedings of the DARPA Broadcast News Workshop
(pp. 249-252), Herndon, VA.
Mata, A.I. (1990). Questões de entoação e interrogação no Português. Isso é uma pergunta? Unpublished MA dissertation, University of Lisbon.
Mertens, P. (2004). The Prosogram: Semi-automatic transcription of prosody based on a tonal perception model. In Speech Prosody 2004 (pp. 23-26), Nara, Japan.
Moniz, H., Batista, F., Trancoso, I., & Mata, A.I. (2012). Prosodic context-based analysis of disfluencies. In Proceedings of Interspeech 2012 (pp. 1961-1964), Portland, OR.
Moniz, H. (2013). Processing disfluencies in European Portuguese. Unpublished PhD dissertation. University of Lisbon.
Nakatani, C., & Hirschberg, J. (1994). A corpus-based study of repair cues in spontaneous speech. Journal of the Acoustical Society of America, 95, 1603-1616.
Neto, J., Meinedo, H., Viveiros, M., Cassaca, R., Martins, C., & Caseiro, D. (2008). Broadcast news subtitling system in Portuguese.
Proceedings of ICASSP’08
(pp. 1561-1564), Las Vegas, NV.
Ostendorf, M., Favre, B., Grishman, R., Hakkani-Tür, D., Harper, M., Hillard, D., Hirschberg, J., Ji, H., Kahn, J., Liu, Y., Makey, S., Matusov, E., Ney, H., Rosenberg, A., Shriberg, E., Wang, W., & Wooters, C. (2008). Speech segmentation and spoken document processing. IEEE Signal Processing Magazine, 25, 59-69.
Pellegrini, T., Moniz, H., Batista, F., Trancoso, I., & Astudillo, R. (2012). Extension of the LECTRA corpus: Classroom LECture TRAnscriptions in European Portuguese. In
Proceedings of Speech and Corpora
(pp. 98-102), Belo Horizonte, Brazil.
Pierrehumbert, J., & Hirschberg, J. (1990). The meaning of intonational contours in the interpretation of discourse. In P. Cohen, J. Morgan & M. Pollack (Eds.), Intentions in communication (pp. 271-311). Cambridge, MA: The MIT Press.
Rosenberg, A. (2009). Automatic detection and classification of prosodic events. Unpublished PhD dissertation, Columbia University.
Shattuck-Hufnagel, S., & Turk, A. (1996). A prosody tutorial for investigators of auditory sentence processing. Journal of Psycholinguistic Research, 25(2), 193-247.
Shriberg, E. (1994). Preliminaries to a theory of speech disfluencies. Unpublished PhD dissertation, University of California.
Shriberg, E., Stolcke, A., Hakkani-Tür, D., & Tür, G. (2000). Prosody-based automatic segmentation of speech into sentences and topics. Speech Communication, 32(1-2), 127-154.
Shriberg, E., Favre, B., Fung, J., Hakkani-Tür, D., & Cuendet, S. 2009. Prosodic similarities of dialog act boundaries across speaking styles. In S.-C. Tseng (Ed.), Linguistic patterns in spontaneous speech (pp. 213-239). Taipei: Institute of Linguistics, Academia Sinica.
Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., Pierrehumbert, J., & Hirschberg, J. (1992). ToBI: A standard for labeling English prosody. In
Proceedings of CSLP'98
(pp. 867-870), Banff, Canada.
Sjölander, K., Beskow, J., Gustafson, J., Lewin, E., Carlson, R., & Granström, B. (1998). Web-based educational tools for speech technology. In
Proceedings of ICSLP 1998
(pp. 3217-3220), Sydney, Australia.
Trancoso, I., Martins, R., Moniz, H., Mata, A.I., & Viana, M.C. (2008). The LECTRA Corpus - Classroom lecture transcriptions in European Portuguese. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis & D. Tapias (Eds.), Proceedings LREC'08 (pp. 1416-1420), Marrakech, Morocco.
Trancoso, I., Viana, M.C., Duarte, I., & Matos, G. (1998). Corpus de Diálogo CORAL. In
Proceedings of PROPOR'98
, Porto Alegre, Brasil.
Vaissière, J. (1983). Language-independent prosodic features. In A. Cutler, & R. Ladd (Eds.), Prosody: Models and measurements (pp. 55-66). Berlin: Springer.
Viana, M.C. (1987). Para a síntese da entoação do Português. Unpublished PhD dissertation, University of Lisbon.
Viana, M.C., Frota, S., Falé, I., Fernandes, F., Mascarenhas, I., Mata, A.I., Moniz, H., & Vigário, M. (2007). Towards a P_ToBI. In
Proceedings of PaPI 2007
, Minho, Portugal.
