Article published In: International Journal of Corpus Linguistics
Vol. 22:2 (2017) ► pp.242–269
Discourse markers and (dis)fluency in English and French
Variation and combination in the DisFrEn corpus
Published online: 16 October 2017
https://doi.org/10.1075/ijcl.22.2.04cri
https://doi.org/10.1075/ijcl.22.2.04cri
Abstract
While discourse markers (DMs) and (dis)fluency have been extensively studied in the past as separate phenomena, corpus-based research combining large-scale yet fine-grained annotations of both categories has, however, never been carried out before. Integrating these two levels of analysis, while methodologically challenging, is not only innovative but also highly relevant to the investigation of spoken discourse in general and form-meaning patterns in particular. The aim of this paper is to provide corpus-based evidence of the register-sensitivity of DMs and other disfluencies (e.g. pauses, repetitions) and of their tendency to combine in recurrent clusters. These claims are supported by quantitative findings on the variation and combination of DMs with other (dis)fluency devices in DisFrEn, a richly annotated and comparable English-French corpus representative of eight different interaction settings. The analysis uncovers the prominent place of DMs within (dis)fluency and meaningful association patterns between forms and functions, in a usage-based approach to meaning-in-context.
Keywords: discourse markers, disfluency, corpus annotation, usage-based, speech
Article outline
- 1.Introduction
- 2.Discourse markers and (dis)fluency in corpus linguistics
- 2.1Corpora annotated with DM information
- 2.1.1Definition of DMs
- 2.1.2Crosslinguistic research on DMs: From case studies to full categories
- 2.1.3Annotating the functions of discourse markers
- 2.2Corpora annotated with (dis)fluency information
- 2.2.1Brief overview of (dis)fluency annotation frameworks
- 2.2.2Discourse markers in previous approaches to (dis)fluency
- 2.3A usage-based approach to the integration of DMs and (dis)fluency
- 2.1Corpora annotated with DM information
- 3.Methodology
- 3.1Dataset construction
- 3.2Annotation schemes: A bottom-up approach to corpus annotation
- 3.2.1Discourse markers: Multi-layered annotation
- 3.2.2(Dis)fluency: Word-level tagging
- 4.Results
- 4.1Quantitative analysis: Variation of DMs and fluencemes
- 4.2Qualitative analysis: Usage-based schemata of forms and functions
- 5.Summary and discussion
- Acknowledgements
- Notes
References
References (62)
Aijmer, K. (2013). Understanding Pragmatic Markers: A Variational Pragmatic Approach. Amsterdam/Philadelphia: John Benjamins.
Aijmer, J., & Simon-Vandenbergen, A. -M. (2011). Pragmatic markers. In J. Zienkowski, J. -O. Östman & J. Verschueren (Eds.), Discursive Pragmatics (pp. 223–247). Amsterdam/Philadelphia: John Benjamins.
Beeching, K. (2013). A parallel corpus approach to investigating semantic change. In K. Aijmer & B. Altenberg (Eds.), Advances in Corpus-based Contrastive Linguistics. Studies in Honour of Stig Johansson (pp. 103–125). Amsterdam/Philadelphia: John Benjamins.
Beliao, J., & Lacheret, A. (2013). Disfluency and discursive markers: When prosody and syntax plan discourse. In R. Eklund (Ed.), Proceedings of Disfluency in Spontaneous Speech (DiSS) 2013. TMH-QPSR, 54(1), 5–8.
Besser, J., & Alexandersson, J. (2007). A comprehensive disfluency model for multi-party interaction. In S. Keizer, H. Bunt & T. Paek (Eds.), Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue (pp. 182–189).
Bolly, C., & Degand, L. (2009). Quelle(s) fonction(s) pour “donc” en français oral? Du connecteur conséquentiel au marqueur de structuration du discours. Lingvisticae Investigationes, 32(1), 1–32.
Bolly, C., Crible, L., Degand, L., & Uygur-Distexhe, D. (2017). Towards a model for discourse marker annotation. From potential to feature-based discourse markers. In C. Fedriani & A. Sansó (Eds.), Discourse Markers, Pragmatic Markers and Modal Particles: New Perspectives (pp. 71–97). Amsterdam/Philadelphia: John Benjamins.
Bortfeld, H., Leon, S., Bloom, J., Schober, M., & Brennan, S. (2001). Disfluency rates in conversation: Effects of age, relationship, topic, role and gender. Language and Speech, 44(2), 123–147.
Boula de Mareüil, P., Adda, G., Adda-Decker, M., Barras, C., Habert, B., & Paroubek, P. (2013). Une étude quantitative des marqueurs discursifs, disfluences et chevauchements de parole dans des interviews politiques. TIPA Travaux Interdisciplinaires sur la Parole et le Langage, 291.
Bouraoui, J. -L., & Vigouroux, N. (2006). Étude de dysfluences dans un corpus linguistiquement contraint. In Proceedings of the Journée d’Etudes sur la Parole (JEP 2006) (pp. 429–432).
Brognaux, S., Roekhaut, S., Drugman, T., & Beaufort, R. (2012).
Train&Align: A new online tool for automatic phonetic alignment. In Proceedings of IEEE Spoken Language Technology Workshop (SLT) (pp. 416–421).
Candéa, M. (2000). Contribution à l’Etude des Pauses Silencieuses et des Phénomènes Dits “d’Hésitation” en Français Oral Spontané (Unpublished doctoral dissertation). Université Paris III, Paris.
Crible, L. (2014). Identifying and Describing Discourse Markers in Spoken Corpora. Annotation Protocol v.8 (Technical report). Louvain-la-Neuve, Université catholique de Louvain.
(2017). Towards an operational category of discourse markers: A definition and its model. In A. Sansó & C. Fedriani (Eds.), Discourse Markers, Pragmatic Markers and Modal Particles: New Perspectives (pp. 99–124). Amsterdam/Philadelphia: John Benjamins.
Crible, L., & Degand, L. (forthcoming). Reliability vs. granularity in discourse annotation: What is the trade-off? Corpus Linguistics and Linguistic Theory.
Crible, L., Degand, L., & Gilquin, G. (2017). The clustering of discourse markers and filled pauses: A corpus-based French-English study of (dis)fluency. Languages in Contrast 17(1), 69–95.
Crible, L., Dumont, A., Grosman, I., & Notarrigo, I. (2016). Annotation Manual of Fluency and Disfluency Markers in Multilingual, Multimodal, Native and Learner Corpora. Version 2.0 (Technical report). Louvain-la-Neuve & Namur, Université catholique de Louvain & Université de Namur.
Degand, L., Martin, L., & Simon, A. -C. (2014). LOCAS-F: Un corpus oral multigenres annoté. Paper presented at the Congrès Mondial de Linguistique Française, Berlin, Germany.
Demirşahin, I., & Zeyrek, D. (2014). Annotating discourse connectives in spoken Turkish. In L. Levin & M. Stede (Eds.), LAW VIII – The 8th Linguistic Annotation Workshop (pp. 105–109).
Denke, A. (2009). Nativelike Performance. Pragmatic Markers, Repair and Repetition in Native and Non-native English Speech. Saarbrücken: Verlag Dr. Müller.
Dister, A., Francard, M., Hambye, P., & Simon, A. -C. (2009). Du corpus à la banque de données. Du son, des textes et des métadonnées. L’évolution de la banque de données textuelles orales VALIBEL (1989–2009). Cahiers de Linguistique, 33(2), 113–129.
Dupont, M. (2015). Word order in English and French: The position of English and French adverbial connectors of contrast. English Text Construction, 8(1), 88–124.
Ejzenberg, R. (2000). The juggling act of oral fluency: A psycho-sociolinguistic metaphor. In H. Riggenbach (Ed.), Perspectives on Fluency (pp. 288–313). Ann Arbor: The University of Michigan Press.
Eklund, R. (2004). Disfluency in Swedish Human-human and Human-machine Travel Booking Dialogues (Unpublished doctoral dissertation). Linköpings Universitet, Linköping.
Eklund, R., & Shriberg, E. (1998). Crosslinguistic disfluency modeling: A comparative analysis of Swedish and American English human-human and human-machine dialogs. In R. H. Mannell & J. Robert-Ribes (Eds.), Proceedings of the 5th International Conference on Spoken Language Processing (pp. 2627–2630). Canberra: Australian Speech Science and Technicology Association, Incorporated (ASSTA).
Gilquin, G. (2006). The place of prototypicality in corpus linguistics. Causation in the hot seat. In S. Gries & A. Stefanowitsch (Eds.), Corpora in Cognitive Linguistics: Corpus-based Approaches to Syntax and Lexis (pp. 159–191). Berlin: Mouton de Gruyter.
Gilquin, G., & Gries, S. (2009). Corpora and experimental methods: A state-of-the-art review. Corpus Linguistics and Linguistic Theory, 5(1), 1–26.
Goldman, J. -P., Prsir, T., & Auchlin, A. (2014). C-PhonoGenre: A 7-hour corpus of 7 speaking styles in French: Relations between situational features and prosodic properties. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk & S. Piperidis (Eds.), Proceedings of the 9th Language Resources and Evaluation Conference (LREC’14) (pp. 302–305). Paris, European Language Resources Association (ELRA).
González, M. (2005). Pragmatic markers and discourse coherence relations in English and Catalan oral narrative. Discourse Studies, 77(1), 53–86.
Götz, S. (2013). Fluency in Native and Nonnative English Speech. Amsterdam/Philadelphia: John Benjamins.
Grosjean, F., & Deschamps, A. (1975). Analyse contrastive des variables temporelles de l’anglais et du français: Vitesse de parole et variables composantes, phénomènes d’hésitation. Phonetica, 31(3–4), 144–184.
Grosman, I. (2016). How do French humorists manage their persona across situations? A corpus study on their prosodic variation. In L. Ruiz-Gurillo (Ed.), Metapragmatics of Humor: Current Research Trends (pp. 147–175). Amsterdam: John Benjamins.
Hansen, M. -B. M. (2006). A dynamic polysemy approach to the lexical semantics of discourse markers (with an exemplary analysis of French toujours). In K. Fischer (Ed.), Approaches to Discourse Particles (pp. 21–41). Amsterdam: Elsevier.
Hasselgren, A. (2002). Learner corpora and language testing: Small words as markers of learner fluency. In S. Granger, J. Hung & S. Petch-Tyson (Eds.), Computer-Learner Corpora, Second Language Acquisition, and Foreign Language Teaching (pp. 143–173). Amsterdam/Philadelphia: John Benjamins.
Kemmer, S., & Barlow, M. (2000). Introduction: A usage-based conception of language. In M. Barlow & S. Kemmer (Eds.), Usage Based Models of Language (pp. vii–xxviii). Stanford: CSLI.
Kohn, K. (2012). Pedagogic corpora for content and language integrated learning. Insights from the BACKBONE project. The Eurocall Review, 20(2), 1–22.
Kunz, K., & Lapshinova-Koltunski, E. (2015). Cross-linguistic analysis of discourse variation across registers. Nordic Journal of English Studies, 14(1), 258–288.
Lacheret, A., Kahane, S., & Pietrandrea, P. (Eds.) (2014). Rhapsodie: A Prosodic and Syntactic Treebank for Spoken French. Amsterdam/Philadelphia: John Benjamins.
Lopes, A., Martins de Matos, D., Cabarrão, V., Ribeiro, R., Moniz, H., Trancoso, I., & Mata, A. I. (2015). Towards using machine translation techniques to induce multilingual lexica of discourse markers. Computing Research Repository (CoRR), 1–6 [URL] (last accessed August 2017).
Meteer, M. Taylor, A., MacIntyre, R., & Iver, R. (1995). Disfluency Annotation Stylebook for the Switchboard Corpus (Technical report). Linguistic Data Consortium. Philadelphia, PA, University of Pennsylvania.
Müller, S. (2005). Discourse Markers in Native and Non-native English Discourse. Amsterdam/Philadelphia: John Benjamins.
Nelson, G., Wallis, S., & Aarts, B. (2002). Exploring Natural Language: Working with the British Component of the International Corpus of English. Amsterdam/Philadelphia: John Benjamins.
Palisse, S. (1997). “Artisans”, “Assureurs”, Conversations Téléphoniques en Entreprise. Retrieved from [URL] (last accessed March 2014).
Pallaud, B., Rauzy, S., & Blâche, P. (2013). Auto-interruptions et disfluences en français parlé dans quatre corpus du CID. TIPA Travaux Interdisciplinaires sur la Parole et le Langage, 291, 2–19.
Pawley, A., & Syder, F. (2000). The one-clause-at-a-time hypothesis. In H. Riggebbach (Ed.), Perspectives on Fluency (pp. 163–199). Ann Arbor: The University of Michigan Press.
Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., & Webber, B. (2008). The Penn Discourse TreeBank 2.0. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis & D. Tapias (Eds.), Proceedings of the 6th Language Resources and Evaluation Conference (LREC’08) (pp. 2961–2968). Paris, European Language Resources Association (ELRA).
Roekhaut, S., Brognaux, S., Beaufort, R., & Dutoit, T. (2014). eLite-HTS: Un outil TAL pour la génération de synthèse HMM en français. Paper presented at the Journées d’Etude de la Parole (JEP), Le Mans, France.
Rühlemann, C., & O’Donnell, M. (2012). Introducing a corpus of conversational stories. Construction and annotation of the Narrative Corpus
. Corpus Linguistics and Linguistic Theory, 8(2), 313–350.
Schegloff, E., Jefferson, G., & Sacks, H. (1977). The preference for self-correction in the organization of repair in conversation. Language, 53(2), 361–382.
Schmid, H. (1997). Probabilistic part-of-speech tagging using decision trees. In D. Jones & H. Somers (Eds.), New Methods in Language Processing (pp. 154–164). London: UCL Press.
Schmid, H. -J. (2010). Does frequency in text instantiate entrenchment in the cognitive system. In D. Glynn & K. Fischer (Eds.), Quantitative Methods in Cognitive Semantics: Corpus-Driven Approaches (pp. 101–133). Berlin: Mouton de Gruyter.
Schmidt, T., & Wörner, K. (2009). EXMARaLDA – Creating, analysing and sharing spoken language corpora for pragmatic research. Pragmatics, 19(4), 565–582.
Shriberg, E. (1994). Preliminaries to a Theory of Speech Disfluencies (Unpublished doctoral dissertation). University of California, Berkeley, CA.
Simon, A. -C., Auchlin, A., Avanzi, M., & Goldman, J.-Ph. (2010). Les phonostyles. Une description prosodique des styles de parole en français. In M. Abecassis & G. Ledegen (Eds.), Les Voix des Français. En Parlant, en Ecrivant, vol. 21 (pp. 71–88). Bern: Peter Lang.
Strassel, S. (2003). Simple Metadata Annotation Specification v.5 (Technical report). Linguistic Data Consortium. Philadelphia, PA, University of Pennsylvania.
Tonelli, S., Riccardi, G., Prasad, R., & Joshi, A. (2010). Annotation of discourse relations for conversational spoken dialogs. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner & D. Tapias (Eds.), Proceedings of the 7th Language Resources and Evaluation Conference (LREC’10) (pp. 2084–2090). Paris, European Language Resources Association (ELRA).
Willems, D., & Demol, A. (2006).
Vraiment and really in contrast: When truth and reality meet. In K. Aijmer & A. -M. Simon-Vandenbergen (Eds.), Pragmatic Markers in Contrast (pp. 215–235). Amsterdam: Elsevier.
Zikánová, Š., Hajičová, E., Hladká, B., Jínová, P., Mírovský, J., Nedoluzhko, A., Poláková, L., Rysová, K., Rysová, M., & Václ, J. (2015). Discourse and Coherence. From the Sentence Structure to Relations in Text. Prague: Institute of Formal and Applied Linguistics.
Zufferey, S., & Cartoni, B. (2012). English and French causal connectives in contrast. Languages in Contrast, 12(2), 232–250.
Zufferey, S., & Degand, L. (2013). Annotating the meaning of discourse connectives in multilingual corpora. Corpus Linguistics and Linguistic Theory.
Cited by (25)
Cited by 25 other publications
Crible, Ludivine & Loulou Kosmala
Berthe, Florine
Flinn, Andrea
Morady Moghaddam, Mostafa
Niculescu, Oana
Blochowiak, Joanna, Cristina Grisot & Liesbeth Degand
Kazemian, Reza & Mohammad Amouzadeh
2022. Aspects ofvæ(‘and’) as a discourse marker in Persian. Pragmatics. Quarterly Publication of the International Pragmatics Association (IPrA) 32:4 ► pp. 588 ff.
Degand, Liesbeth & Ludivine Crible
2021. Discourse markers at the peripheries of syntax, intonation and
turns. In Pragmatic Markers and Peripheries [Pragmatics & Beyond New Series, 325], ► pp. 19 ff.
Izutsu, Mitsuko Narita & Katsunobu Izutsu
Jarrah, Marwan, Sharif Alghazo & Yousef Bader
König, Katharina
Crible, Ludivine & Vera Demberg
Crible, Ludivine
2019. Local vs. global scope of discourse markers. In Empirical studies of the construction of discourse [Pragmatics & Beyond New Series, 305], ► pp. 43 ff.
Crible, Ludivine
Crible, Ludivine
2022. The syntax and semantics of coherence relations. International Journal of Corpus Linguistics 27:1 ► pp. 59 ff.
Crible, Ludivine, Ágnes Abuczki, Nijolė Burkšaitienė, Péter Furkó, Anna Nedoluzhko, Sigita Rackevičienė, Giedrė Valūnaitė Oleškevičienė & Šárka Zikánová
Crible, Ludivine & Liesbeth Degand
Cuenca, Maria Josep & Ludivine Crible
Didirková, Ivana, Ludivine Crible & Anne Catherine Simon
Haselow, Alexander
Haselow, Alexander
Haselow, Alexander
2021. Discourse markers and brain lateralization. In Studies at the Grammar-Discourse Interface [Studies in Language Companion Series, 219], ► pp. 157 ff.
[no author supplied]
[no author supplied]
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
