In:Spoken Corpora and Linguistic Studies
Edited by Tommaso Raso and Heliana Mello
[Studies in Corpus Linguistics 61] 2014
► pp. 331–364
The notion of sentence and other discourse units in corpus annotation
Published online: 14 November 2014
https://doi.org/10.1075/scl.61.12pie
https://doi.org/10.1075/scl.61.12pie
The notion of sentence – as it is defined in syntactic, semantic, graphic and prosodic terms – is not a suitable maximal unit for the prosodic and syntactic annotation of spoken corpora. Still, this notion is taken as a reference in many syntactic and prosodic annotation systems. We present here the modular approach we adopted for the annotation of the Rhapsodie corpus of spoken French, which led us to distinguish three types of elementary units operating in discourse (government units, illocutionary units, and intonational periods) and to annotate them separately. We describe the types of interactions identified among these various levels of cohesion. On this basis we propose a reappraisal of the traditional notion of sentence and we define two additional types of discourse units that we consider as the minimal and the maximal span for the notion of sentence.
References (68)
Aijmer, Karin. 2002. English Discourse Particles. Evidence from a Corpus[Studies in Corpus Linguistics 10]. Amsterdam: John Benjamins.
Andersen, Hanne Leth & Nølke, Henning (eds). 2002. Macro-syntaxe et macro-sémantique, Actes du colloque international d’Århus, 17–19 mai 2001. Bern: Peter Lang.
Avanzi, Mathieu, Lacheret, Anne & Victorri, Bernard. 2008. Analor. A tool for semi-automatic annotation of French prosodic structure. In
Speech Prosody 2008
,
Campinas, Brazil
, 119–122.
Bazzanella, Carla. 1995. I segnali discorsivi. In Grande grammatica italiana di consultazione, Vol. III, Lorenzo Renzi, Giampaolo Salvi & Anna Cardinaletti (eds), 225–257. Bologna: Il Mulino.
Beckman, Mary E. & Elman, Gayle Ayers. 1997. Guidelines for ToBi Labelling, version 3. Columbus OH: The Ohio State University Research Foundation.
Benzitoun, Christoph, Dister, Anne, Gerdes, Kim, Kahane, Sylvain, Pietrandrea, Paola & Sabio, Frédéric. 2010. Tu veux couper là faut dire pourquoi. Propositions pour une segmentation syntaxique du français parlé. In
Actes du Congrès Mondial de Linguistique Française (CMLF 2010)
,
New Orleans
.
Blanche-Benveniste, Claire, Borel, Bernard, Deulofeu, José, Durand, Jacky, Giacomi, Alain, Loufrani, Claude, Meziane, Boudjema & Pazery, Nelly. 1979. Des grilles pour le français parlé. Recherches sur le Français Parlé 2: 163–205.
Blanche-Benveniste, Claire. 1990. Un modèle d’analyse syntaxique ‘en grilles’ pour les productions orales. Anuario de Psicologia 47: 11–28.
Blanche-Benveniste, Claire, Bilger, Mireille, Rouget, Christine & Van den Eyende, Karel. 1990. Le français parlé. Etudes grammaticales. Paris: Editions du Centre National de la Recherche Scientifique.
Bohmová, Alena, Hajič, Jan, Hajičová, Eva & Hladká, Barbora. 2003. The PDT: A 3-level annotation scenario. In Treebanks: Building and Using Parsed Corpora, Anne Abeillé (ed.), 103–127. Dordrecht: Kluwer.
Bonvino, Elisabetta, Masini, Francesca & Pietrandrea, Paola. 2009. List Constructions: A semantic network. In
Troisième Conférence Internationale de l’AFLiCo
,
Nanterre
. [URL]
Bourigault, Didier. 2007. Un analyseur syntaxique opérationnel: SYNTEX. Habilitation à Diriger les Recherches, Université Toulouse-Le Mirail.
Brown, Penelope & Levinson, Stephen. 1978. Universals in language use: Politeness phenomena. In Questions and Politeness: Strategies in Social Interaction, Esther Goody (ed.), 56–310. Cambridge: CUP.
Chafe, Wallace L. 1998. Language and the flow of thought. In The New Psychology of Language, Michael Tomasello (ed.), 93–111. Hillsdale NJ: Lawrence Erlbaum Associates.
. 2005. Enunciato e frase. Teoria e verifiche empiriche. In Italia linguistica: discorsi di scritto e di parlatoEnunciato e frase: Teoria e verifiche empiriche, Scritti in onore di Giovanni Nencioni, Marco Biffi, Omar Calabrese & Luciana Salibra (eds). Siena: Prolagon.
Degand, Liesbeth & Simon, Anne Catherine. 2009. On identifying basic discourse units in speech: Theoretical and empirical issues. Discours 4 [URL]
Dehé, Nicole & Kavalova, Yordanka. 2006. The syntax, pragmatics, and prosody of parenthetical what
. English Language and Linguistics 10: 289–320.
Delais-Roussarie, Élisabeth. 2005. Phonologie et grammaire: Etudes et modélisation des interfaces prosodiques. Habilitation à diriger des recherches, Université de Toulouse-le Mirail.
Deulofeu, José. 1999. Recherches sur les formes de la prédication dans les énoncés assertifs en français contemporain (le cas des énoncés introduits par le morphème que).Thèse d’état, Université Paris 3.
Deulofeu, José, Dufort, Lucie, Gerdes, Kim, Kahane, Sylvain & Pietrandrea, Paola. 2010. Depends on what the French say: Spoken corpus annotation with and beyond syntactic function,
4th Linguistic Annotation Workshop (LAW IV)
, ACL, Uppsala, 274–281.
Gerdes, Kim. 2013. Collaborative dependency annotation. In
Proceedings of Depling
,
Prague
, 88–97.
Gerdes, Kim & Kahane, Silvain. 2009. Speaking in piles. Paradigmatic annotation of a Spoken French Corpus. In
5th Corpus Linguistics Conference
,
Liverpool
. [URL]
Hajič, Jan. 1998. Building a syntactically annotated corpus: The Prague Dependency Treebank. In Issues of Valency and Meaning. Studies in Jonour of Jarmila Panevová, Eva Hajičová (ed.), 106–132. Prague: Karolinum.
’t Hart, Johan, Collier, René & Cohen, Antonie. 1990. Perceptual Study of Intonation: An Experimental-Phonetic Approach to Speech Melody. Cambridge: CUP.
Hasegawa-Johnson, Mark, Chen, Ken, Cole, Jennifer, Borys, Sarah, Kim, Sung-Suk, Cohen, Aaron, Zhang, Tong, Choi, Jeung-Yoon, Kim, Heejin, Yoon, Taejin & Chavarria, Sandra. 2005. Simultaneous recognition of words and prosody in the Boston University Radio Speech Corpus. Speech Communication 46(3–4): 418–439
Holmes, Janet. 1986.
Functions of you know in women’s and men’s speech, Language in Society 15: 1–21.
Kahane, Sylvain. 2012. De l’analyse en grille à la modélisation des entassements. In Penser les langues avec Claire Blanche-Benveniste, Sandrine Caddeo, Marie-Noëlle Roubaud, Magali Rouquier & Frédéric Sabio (eds), 101–116. Aix-en-Provence: Presses de l’université de Provence.
. 2013. Tutoriel codage microsyntaxique. [URL]
Kahane, Sylvain & Pietrandrea, Paola. 2012a. La typologie des entassements en français. In
Actes du 3ème congrès mondial de linguistique française (CMLF)
,
Lyon
, 1809–1828.
. 2012b. Les parenthétiques comme ‘Unités Illocutoires Associées’: Une perspective macrosyntaxique. Linx 61: 49–70.
Kärkkäinen, Elise. 2003. Epistemic Stance in English Conversation. A Description of its Interactional Functions, with a Focus on I Think [Pragmatics & Beyond New Series 115]. Amsterdam: John Benjamins.
Lacheret, Anne, Kahane, Sylvain, Pietrandrea, Paola, Avanzi, Mathieu, Victorri, Bernard. 2011. Oui mais elle est où la coupure, là? Quand syntaxe et prosodie s’entraident ou se complètent. Langue Française 170: 61–80.
Lacheret, Anne, Kahane, Sylvain, Pietrandrea, Paola, Obin, Nicolas, Beliac, Julie, Tchobanov, Atanas, Gerdes, Kim, Goldman, Jean Philippe. 2014. Rhapsodie: a prosodic-syntactic treebank for spoken French.
9th Language Resources and Evaluation Conference
. Reykjavik (Iceland), 26–31 May 2014.
Loufrani, Claude. 1984. Le locuteur collectif. Typologie de configurations discursives. Recherches sur le Français Parlé 6: 169–193.
Miller, Jim & Weinert, Regina. 1998[2009]. Spontaneous Spoken Language. Syntax and Discourse. Oxford: OUP.
Nivre, Jaokim. 2008. Treebanks. In Corpus Linguistics, Anke Lüdeling & Merja Kytö (eds), 225–24. Berlin: Mouton de Gruyter.
Nølke, Hennng & Adam Jean Michel (eds). 1999. Approches modulaires, de la langue au discours. Lausanne: Delachaux et Niestlé.
Nølke, Henning. 1990. Recherches sur les adverbes: Bref aperçu historique des travaux de classification. Langue Française 88: 117–122.
Ostendorf, Mari, Shafran, Izhak, Shattuck-Hufnagel, Stefanie, Carmichael, Leslie & Byrne, William. 2001. A prosodically labeled database of spontaneous speech. In
Proceedings ISCA Tutorial and Research Workshop on Prosody in Speech Recognition and Understanding
,
Red Bank, NJ
.
Östman, Jan-Ola. 1981. ‘You know’. A Discourse-functional Approach [Pragmatics & Beyond II:7]. Amsterdam: John Benjamins.
Ross, John R. 1973. Slifting. In The Formal Analysis of Natural Language, Maurice Gross, Morris Halle, Marcel P. Schützenberger (eds), 133–169. Berlin: Mouton.
Rossi, Mario. 1979. Le français, langue sans accent? In L’accent en français contemporain [Studia Phonetica 15], Ivan Fonagy & Pierre Léon (eds), 13–51. Paris: Didier.
Roulet, Eddy, Filliettaz, Laurent, Grobet, Anne & Burger, Marcel. 2001. Un modèle et un instrument d’analyse de l’organisation du discours, [Collection Sciences pour la Communication]. Bern: Peter Lang.
Sabio, Frédéric. 2006. Phrases et constructions verbales: Quelques remarques sur les unités syntaxiques dans le français parlé. In Constructions verbales et production de sens, Daniel Lebaud, Catherine Paulin & Katja Ploog (eds). Besançon: Presses Universitaires de Franche-Comté.
Schelfhout, Carla, Coppen, Peter-Arno & Oostdijk, Nelleke. 2004. Finite comment clauses in Dutch: A corpus-based approach. Journal of Germanic Linguistics 16: 331–349.
Selkirk, Elisabeth. 2005. Comments on intonational phrasing. In Prosodies, Sonja Frota, Marina Vigario, M. João Freitas (eds), 11–58. Berlin: Mouton de Gruyter.
Villemonte de La Clergerie, Eric. 2005. DyALog: A tabular logic programming based environment for NLP. In
2nd International Workshop on Constraint Solving and Language Processing (CSLP’05)
,
Barcelona, Spain
.
Resources
Avanzi, Mathieu. 2012. L’interface prosodie/syntaxe en français: Dislocations, incises et asyndètes. Bern: Peter Lang.
Avanzi, Mathieu, Simon, Anne Catherine, Goldman, Jean-Philippe & Auchlin, Antoine. 2010. C-PROM. Un corpus de français parlé annoté pour l’étude des proéminences. In
Actes des 23èmes journées d’étude sur la parole
,
Mons, Belgique
, 25–28 mai.
Branca-Rosoff, Sonia, Fleury, Serge, Lefeuvre, Florence & Pires, Matthew. 2012. Discours sur la ville. Corpus de Français Parlé Parisien des années 2000 (CFPP2000).[URL]
Durand, Jacques, Laks, Bernard & Lyche, Chantal. 2009. Le projet PFC (phonologie du français contemporain): Une source de données primaires structurées. In Phonologie, variation et accents du français, Jacques Durand, Bernard Laks & Chantal Lyche (eds), 19–61. Paris: Hermès.
Cited by (19)
Cited by 19 other publications
Agmon, Galit, Manuela Jaeger, Ella Magen, Danna Pinto, Yuval Perelmuter, Elana Zion Golumbic & Martin G. Bleichner
Yang, Linsey C., Wenwei Dong, Nathan Vandeweerd & Jet Hoek
2025. Automatic discourse segmentation of L1 and L2 spoken English transcripts. International Journal of Learner Corpus Research
Gregov, Nicolas
Minoccheri, Chiara, Christophe Combe, Dejan Stosic, F. Neveu, S. Prévost, A. Montébran, A. Steuckardt, G. Bergounioux, G. Merminod & G. Philippe
Wang, Xiaoman & Binhua Wang
Sanguinetti, Manuela, Cristina Bosco, Lauren Cassidy, Özlem Çetinoğlu, Alessandra Teresa Cignarella, Teresa Lynn, Ines Rehbein, Josef Ruppenhofer, Djamé Seddah & Amir Zeldes
Mikkelsen, Olaf & Stefan Hartmann
2022. Competing future constructions and the Complexity Principle. In Broadening the Spectrum of Corpus Linguistics [Studies in Corpus Linguistics, 105], ► pp. 9 ff.
Minoccheri, Chiara, Dejan Stosic, F. Neveu, S. Prévost, A. Steuckardt, G. Bergounioux & B. Hamma
Pausé, Marie-Sophie, Agnès Tutin, Olivier Kraif, Maximin Coavoux, F. Neveu, S. Prévost, A. Steuckardt, G. Bergounioux & B. Hamma
Sabio, Frédéric, Marie-Noëlle Roubaud & Berthille Pallaud
Kibrik, Andrej A., Nikolay A. Korotaev & Vera I. Podlesskaya
2020. The Moscow approach to local discourse structure. In In search of basic units of spoken language [Studies in Corpus Linguistics, 94], ► pp. 367 ff.
Ferraresi, Adriano & Silvia Bernardini
2019. Building EPTIC. In Parallel Corpora for Contrastive and Translation Studies [Studies in Corpus Linguistics, 90], ► pp. 123 ff.
Bernardini, Silvia, Adriano Ferraresi, Mariachiara Russo, Camille Collard & Bart Defrancq
Cresti, Emanuela & Massimo Moneglia
2018. The illocutionary basis of information structure. In Information structure in lesser-described languages [Studies in Language Companion Series, 199], ► pp. 359 ff.
Sabio, Fréderic
Sabio, Frédéric
2018. On the syntax of spoken French. Revue Romane. Langue et littérature. International Journal of Romance Languages and Literatures 53:1 ► pp. 6 ff.
Kahane, Sylvain & Nicolas Mazziotta
Belião, Julie
[no author supplied]
This list is based on CrossRef data as of 1 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
