In:Corpus-based Studies of Lesser-described Languages: The CorpAfroAs corpus of spoken AfroAsiatic languages
Edited by Amina Mettouchi, Martine Vanhove and Dominique Caubet
[Studies in Corpus Linguistics 68] 2015
► pp. 13–41
Representation of speech in CorpAfroAs
Transcriptional strategies and prosodic units
Published online: 20 May 2015
https://doi.org/10.1075/scl.68.01izr
https://doi.org/10.1075/scl.68.01izr
This paper surveys the transcriptional aspects of CorpAfroAs, a spoken corpus of Afroasiatic languages, with a focus on the representation of phonemes, morphemes, words, and longer units. We discuss the distinction between prosodic, phonological and morphosyntactic word, as well as that between intonation unit, paratone and period. Segmentation and transcription choices are analyzed and their outcome in terms of scientific breakthroughs is presented : the comparison between phonological and morphosyntactic word allows the systematic study of sandhi and other similar phenomena, and of the syntax/phonology interface. The segmentation into prosodic units allows the study of interfaces with syntax, information structure, and discourse.
References (68)
Aikhenvald, Alexandra Y. 2002. Typological parameters for the study of clitics, with special reference to Tariana. In Word: A Cross-linguistic Typology, Robert M.W. Dixon & Alexandra Y. Aikhenvald (eds), 42–78. Cambridge: CUP.
Amir, Noam, Silber-Varod, Vered & Izre'el, Shlomo. 2004. Characteristics of intonation unit boundaries in spontaneous spoken Hebrew: Perception and acoustic correlates. In Speech Prosody 2004, Nara, Japan, March 23-26, 2004: Proceedings, Bernard Bel & Isabelle Marlien (eds), 677–680. <[URL]>
Anderson, Stephen R. 2005. Aspects of the Theory of Clitics [Oxford Studies in Theoretical Linguistics 11]. Oxford: OUP.
Avanzi, Mathieu, Benzitoun, Christophe & Glikman, Julie. 2007. Comment se comprendre sans se méprendre? L'exemple de trois termes problématiques: Période, parataxe et subordination inverse. In Actes du 4ème Colloque Doctorants et Jeunes Chercheurs en Sciences du Langage (Coldoc’07) : Le vocabulaire scientifique et technique en Sciences du Langage, Nanterre, 20-21 juin 2007. <[URL]>
Barontini, Alexandrine. 2012. Moroccan Arabic Corpus. Corpus recorded, transcribed and annotated by Alexandrine Barontini. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken Afro-Asiatic Languages. [URL]. Accessed on 10 January 2012.
Barth-Weingarten Dagmar, Dehé, Nicole & Wichmann, Anne (eds). 2009. Where Prosody Meets Pragmatics [Studies in Pragmatics 8]. Bingley: Emerald.
Basebøl, Hans. 2000. Word boundaries. In Morphologie: ein internationales Handbuch zur Flexion und Wortbildung = Morphology: An International Handbook on Inflection and Word-formation [Handbücher zur Sprach- und Kommunikationswissenschaft – Janbooks of Linguistics and communication Science 17(1)], Gert Booij, Christian Lehmann & Joachim Mugdan, in collaboration with Wolfgang Kesselheim & Stavros Skopeteas (eds), #40, 377–388. Berlin: Walter de Gruyter.
Beckman, Mary E. & Pierrehumbert, Janet B. 1986. Intonational structure in Japanese and English. Phonology Yearbook 3: 255–309.
Beckman, Mary E. & Venditti, Jeniffer J. 2010. Tone and intonation. In The Handbook of Phonetic Sciences, 2nd edn [Blackwell Handbooks in Linguistics], William J. Hardcastle, John Laver & Fiona E. Gibbon (eds), 603–650. Chichester: Wiley-Blackwell.
Blanche-Benveniste, Claire, Bilger, Mirelle, Rouget, Christine & Karel van den Eynde. 1990. Le français parlé: Études grammaticales, Participation de Piet Mertens [Sciences du Language]. Paris: CNRS Éditions.
Brown, Gillian. 1977. Listening to Spoken English [Applied Linguistics and Language Study]. London: Longman.
. 1990. Listening to Spoken English, 2nd edn [Applied Linguistics and Language Study]. London: Longman.
Brown, Gillian, Currie, Karen L. & Kenworthy, Joanne. 1980. Questions of Intonation. London: Croom Helm.
Brown, Gillian & Yule, George. 1983. Discourse Analysis [Cambridge Textbooks in Linguistics]. Cambridge: CUP.
Caink, Andrew D. 2006. Clitics. In Encyclopedia of Language and Linguistics, 2nd edn, Keith Brown (ed.), 491–495. Oxford: Elsevier.
Chafe, Wallace. 1994. Discourse, Consciousness, and Time: The Flow and Displacement of Conscious Experience in Speaking and Writing. Chicago IL: The University of Chicago Press.
Cheng, Winnie, Chris Greaves & Martin Warren. 2005. A Corpus-driven Study of Discourse Intonation: The Hong Kong Corpus of Spoken English [Studies in Corpus Linguistics 32]. Amsterdam: John Benjamins.
CoSIH: The Corpus of Spoken Israeli Hebrew (CoSIH): <[URL]>
Cresti, Emanuela & Moneglia, Massimo (eds). 2005. C-ORAL-ROM: Integrated Reference Corpora for Spoken Romance Languages [Studies in Corpus Linguistics 15]. Amsterdam: John Bejnamins.
Danieli, Morena, Garrido, Juan María, Moneglia, Massimo, Panizza, Andrea Quazza, Silvia & Swerts, Marc . 2004. Evaluation of consensus on the annotation of prosodic breaks in the Romance corpus of spontaneous speech ‘C-ORAL-ROM’. In Speech Corpus Production and Validation, LREC 2004: Fourth International Conference on Language Resources and Evaluation, 24th May, 2004, Lisbon, Christoph Draxler, Henk van den Heuvel & Florian Schiel (eds). 1513–1516. < [URL] >
Debaisieux, Jeanne-Marie & Martin, Philippe. 2010. Les parenthèses: Étude macrosyntaxique et prosodique sur corpus. In La parataxe, Tome 1: Entre dépendance et intégration, Tome 2: Structures, marquages et exploitations discursives, Marie-José Béguelin, Mathieu Avanzi & Gilles Corminboeuf (eds). Bern: Peter Lang.
Dixon, Robert M.W. & Aikhenvald, Alexandra Y. (eds). Word: A Cross-linguistic Typology. Cambridge: CUP.
Du Bois, John W., Cumming, Susanna, Schuetze-Coburn, Stephan & Paolino, Danae. 1992. Discourse Transcription [Santa Barbara Papers in Linguistics 4]. Santa Barbara CA: Department of Linguistics, University of California, Santa Barbara.
. 1993. Outline of discourse transcription. In Talking Data: Transcription and Coding in Discourse Research, Jane A. Edwards, & Martin D. Lampert (eds), 45–89. Hillsdale NJ: Lawrence Erlbaum Associates.
Du Bois, John W. 2004. Representing Discourse. Part 2: Appendices and Projects. Santa Barbara CA: Linguistics Department, University of California. <[URL]>
Esling, John H. 2010. Phonetic Notation. In The Handbook of Phonetic Sciences, 2nd edn [Blackwell Handbooks in Linguistics], William J. Hardcastle, John Laver & Fiona E. Gibbon (eds), 678–702. Chichester: Wiley-Blackwell.
Fletcher, Janet. 2010. The Prosody of Speech: Timing and Rhythm. In The Handbook of Phonetic Sciences, 2nd edn [Blackwell Handbooks in Linguistics], William J. Hardcastle, John Laver & Fiona E. Gibbon (eds), 523–602. Chichester: Wiley-Blackwell.
. 2004. An Introduction to Functional Grammar. 3rd edn revised by Christian M. I. M. Matthiessen. London: Arnold.
Hirst, Daniel & Di Cristo, Albert (eds). 1998. Intonation Systems: A Survey of Twenty Languages. Cambridge: CUP.
Izre'el, Shlomo. Forthcoming. Basic units of language: Prosody, discourse and syntax. In Researching Spoken Hebrew, Einat Gonen (ed.). <[URL]> (In Hebrew; English version in preparation).
Julien, Marit. 2006. Word. In Encyclopedia of Language and Linguistics, 2nd edn, Keith Brown (ed.), 617–624. Oxford: Elsevier.
Kibrik, Andrej A. & Podlesskaya, Vera I. 2006. Problema segmentacii ustnogo diskursa i kognitivnaja sistema govorjashchego (Segmentation of spoken discourse and the speaker’s cognitive system). In Kognitivnye issledovanija, Vol. 1, Valerij D. Solovyev (ed.), 138–158. Moscow: Institut psixologii RAN. <[URL]>; English summary: Discourse as a kind of cognitive activity: The principles of segmentation. In The Second Biennial Conference on Cognitive Science, June 9-13, 2006, St. Petersburg, Russia, Abstracts, Vol. 2, 501–503.
Lacheret, Anne & Victorri, Bernard. 2002. La période intonative comme unité d'analyse pour l'étude du français. In Verbum 24/1-2: Y a-t-il une syntaxe au-delà de la phrase?, Michel Charolles, Pierre Le Goffic & Mary-Annick Morel (eds), 55–72.
Lerner, Gene H. 1996. On the ‘semi-permeable’ character of grammatical units in conversation: Conditional entry into the turn space of another speaker. In Interaction and Grammar, Elinor Ochs, Emanuel A. Schegloff & Sandra A. Thompson (eds), 238–276. Cambridge: CUP.
. 2004. Collaborative Turn Sequences. In Conversation Analysis: Studies from the First Generation [Pragmatics & Beyond New Series 125], Gene H. Lerner (ed.), 225–256. Amsterdam: John Benjamins.
Malibert-Yatziv, II -II. 2012. ‘Hebrew Corpus’. Corpus recorded, transcribed and annotated by II-II Malibert-Yatziv. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. Accessed on 10 January 2012. DOl:
Manfredi, Stefano. 2012. ‘Juba Arabic Corpus’, Corpus recorded, transcribed and annotated by Stefano Manfredi. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. Accessed on 10 January 2012.
Matthews, Peter. H. 2007. Oxford Concise Dictionary of Linguistics, 2nd edn [Oxford Paperback Reference]. Oxford: OUP.
Mettouchi, Amina. 2012. ‘Kabyle Corpus’. Corpus recorded, transcribed and annotated by Amina Mettouchi. In Amina Mettouchi & Christian Chanard (eds). The CorpAfraAs Corpus of Spoken AfroAsiatic Languages. Accessed on 10 January 2012. DOl:
. To appear. The Interaction of State, Prosody and Linear Order in Kabyle (Berber): Grammatical relations and information structure. In Data and Perspectives in Afroasiatic, Alessandro Mengozzi & Mauro Tosco (eds). Amsterdam: John Benjamins.
Mettouchi, Amina & Chanard, Christian. 2010. From fieldwork to annotated corpora: The CorpAfroAs project. Faits de Langues – Les Cahiers 2: 255–266.
Mettouchi, Amina, Lacheret-Dujour, Anne, Silber-Varod, Vered & Izre'el, Shlomo. 2007. Only prosody? Perception of speech segmentation. In
Nouveauz cahiers de linguistique française 28: Interfaces discours – prosodie: Actes du 2ème Symposium international and Colloque Charles Bally, 207–218. <[URL]>; sound files and transcriptions: <[URL]>
Pereira, Christophe. 2012. ‘Tripolinian Arabic Corpus’. Corpus recorded, transcribed and annotated by Christophe Pereira. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. Accessed on 10 January 2012. DOl:
Pierrehumbert, Janet & Hirschberg, Julia. 1990. The meaning of intonational contours in the interpretation of discourse. In Intentions in Communications [Systems Development Foundation Benchmark Series], Philip R. Cohen, Jerry Morgan & Martha E. Pollak (eds), 271–311. Cambridge MA: The MIT Press.
Savà, Graziano. 2012. ‘Ts’amakko Corpus’. Corpus recorded, transcribed and annotated by Graziano Savà. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. Accessed on 10 January 2012. DOl:
Schiering, René, Bickel, Balthasar & Hildebrandt, Kristine A. 2010. The prosodic word is not universal, but emergent. Journal of Linguistics 46: 657–709.
Selkirk, Elisabeth. 1984. Phonology and Syntax: The Relation between Sound and Structure. Cambridge MA: The MIT Press.
Shattuck-Hufnagel, Stefanie & Turk, Alice E. 1996. A prosody tutorial for investigators of auditory sentence processing. Journal of Psycholinguistic Research 25: 193–247.
Silber-Varod, Vered. 2010. Phonological aspects of hesitation disfluencies. Proceedings of Speech Prosody 2010, Chicago. < [URL] >
. 2011. The SpeeCHain Perspective: Prosodic-Syntactic Interface in Spontaneous Spoken Hebrew. PhD dissertation, Tel-Aviv University. <[URL]>
Tao, Hongyin. 1996. Units in Mandarin Conversation: Prosody, Discourse, and Grammar [Studies in Discourse and Grammar 5]. Amsterdam: John Benjamins.
Tosco, Mauro. 2012. ‘Gawwada Corpus’. Corpus recorded, transcribed and annotated by Mauro Tosco. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. Accessed on 10 January 2012. DOl:
Vanhove, Martine. 2012. ‘Beja Corpus’. Corpus recorded, transcribed and annotated by Martine Vanhove. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AjraAsiatic Languages. Accessed on 10 January 2012. DOl:
Vogel, Irene. 2006. Phonological words. In Encyclopedia of Language and Linguistics, 2nd edn, Keith Brown (ed.), 531–534. Oxford: Elsevier.
Wells, John C. 2006. Phonetic transcription and analysis. In Encyclopedia of Language and Linguistics, 2nd edn, Keith Brown (ed.), 386–396. Oxford: Elsevier.
Wichmann, Anne. 2000. Intonation in Text and Discourse: Beginnings, Middles and Ends [Studies in Language and Linguistics]. Harlow: Pearson Education.
. 1995. What is a clitic? In Clitics: A Comprehensive Bibliography, 1892-1991 [Library & Information Sources in Linguistics Series 22], Joel Ashmore Nevis, Brian D. Joseph, Dieter Wanner & Arnold M. Zwicky (eds), xii–xx. Amsterdam: John Benjamins.
Cited by (16)
Cited by 16 other publications
Aznar, Jocelyn & Frank Seifart
Majhenič, Simona, Mitja Beras & Janez Križaj
Moneglia, Massimo & Giorgina Cantalini
Cresti, Emanuela
2020. The pragmatic analysis of speech and its illocutionary classification according to the
Language into Act Theory. In In search of basic units of spoken language [Studies in Corpus Linguistics, 94], ► pp. 181 ff.
Inbar, Anna
2020. List constructions. In Usage-Based Studies in Modern Hebrew [Studies in Language Companion Series, 210], ► pp. 623 ff.
Izre’el, Shlomo
2020. The basic unit of spoken language and the interfaces between prosody, discourse and
syntax. In In search of basic units of spoken language [Studies in Corpus Linguistics, 94], ► pp. 77 ff.
Kibrik, Andrej A., Nikolay A. Korotaev & Vera I. Podlesskaya
2020. The Moscow approach to local discourse structure. In In search of basic units of spoken language [Studies in Corpus Linguistics, 94], ► pp. 367 ff.
Cresti, Emanuela & Massimo Moneglia
2018. The definition of the TOPIC within Language into Act Theory and its identification in spontaneous speech corpora. Revue Romane. Langue et littérature. International Journal of Romance Languages and Literatures 53:1 ► pp. 30 ff.
Cresti, Emanuela & Massimo Moneglia
2018. The illocutionary basis of information structure. In Information structure in lesser-described languages [Studies in Language Companion Series, 199], ► pp. 359 ff.
Izre'el, Shlomo
Izre'el, Shlomo
Mettouchi, Amina
2018. The Interaction of state, prosody and linear order in Kabyle (Berber). In Afroasiatic [Current Issues in Linguistic Theory, 339], ► pp. 261 ff.
Mettouchi, Amina
Shor, Leon
Silber-Varod, Vered
2014. Dependencies over prosodic boundary tones in spontaneous spoken Hebrew. In Dependency Linguistics [Linguistik Aktuell/Linguistics Today, 215], ► pp. 207 ff.
This list is based on CrossRef data as of 1 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
