In:Rhapsodie: A prosodic and syntactic treebank for spoken French
Edited by Anne Lacheret-Dujour, Sylvain Kahane and Paola Pietrandrea
[Studies in Corpus Linguistics 89] 2019
► pp. 271–283
Chapter 15Exploration of the Rhapsodie corpus
Data structure, formats and query tools
Published online: 6 June 2019
https://doi.org/10.1075/scl.89.16lac
https://doi.org/10.1075/scl.89.16lac
Abstract
This chapter describes the data structure of the Rhapsodie Treebank and discusses methodological issues stemming from the complexity of this structure, articulated around three independent, non-aligned, hierarchies: Microsyntactic, macrosyntactic and prosodic, and the challenging questions to be resolved in this context. It discusses the specific problems posed by the simultaneous processing of the phonological stream (prosodic level) and the orthographic stream (syntactic level), which are often far from being isomorphic in French, and the related problem of the processing of disfluent and/or overlapped strings, which have not the same representation in the syntactic and the prosodic hierarchy. Then, it presents the formats adopted to encode prosodic and syntactic annotations and query them simultaneously, given that the prosodic architecture is a non-recursive time-aligned representation while the syntactic one is a recursive tree-based representation.
Article outline
- 1.Introduction
- 2.The complex data structure in Rhapsodie
- 2.1Three independent hierarchies
- 2.2Overlaps in prosody and syntax
- 2.3Non-alignment of syntactic and prosodic basic units
- 3.Encoding formats and query tools
- 3.1Tabular format
- 3.2Trameur: A statistical query tool
- 4.Conclusion
Notes
