Article published In: International Journal of Corpus Linguistics
Vol. 6:2 (2001) ► pp.171–197
Designing CoSIH: The Corpus of Spoken Israeli Hebrew
Published online: 8 August 2002
https://doi.org/10.1075/ijcl.6.2.01izr
https://doi.org/10.1075/ijcl.6.2.01izr
This paper describes the initial design of the Corpus of Spoken Israeli Hebrew (CoSIH). CoSIH will attempt to include a representation of most varieties of spoken Hebrew as it is used in Israel today. CoSIH is designed to consist of two complementary corpora: a main corpus and a supplementary corpus. The main corpus, which will comprise about 90% of the entire collection, will be sampled statistically. For analytical purposes it will use a conceptual tool in the form of a multidimensional matrix combining demographic and contextual tiers. The combined demographic and contextual design will be capable of showing the distribution of speech types in various subgroups of the population. The supplementary corpus will include about 10% of the collected data, and will add to the statistically-sampled corpus some targeted demographically sampled texts and a contextually designed collection. This design is culturally dependent to suit the special structure of the Israeli Hebrew speech community and thus includes both native and non-native speakers of Hebrew. Nonetheless, the principles governing this design are such that they would service study of many other speech communities, to the extent that the design itself may be employed for other corpora with only slight modifications.
Keywords: Israeli Hebrew, corpus design, spoken corpus
Cited by (16)
Cited by 16 other publications
Raso, Tommaso, Bruno Neves Rati de Melo Rocha, João Vinícius Salgado, Breno Fiuza Cruz, Lucas Machado Mantovani & Heliana Mello
Shirtz, Shahar
Shirtz, Shahar
Dash, Niladri Sekhar & L. Ramamoorthy
Ozerov, Pavel
Cresti, Emanuela & Massimo Moneglia
2018. The illocutionary basis of information structure. In Information structure in lesser-described languages [Studies in Language Companion Series, 199], ► pp. 359 ff.
Faust, Noam
Faust, Noam
Ribeiro De Mello, Heliana
2014. Methodological issues for spontaneous speech corpora compilation. In Spoken Corpora and Linguistic Studies [Studies in Corpus Linguistics, 61], ► pp. 27 ff.
Verdonik, Darinka, Iztok Kosem, Ana Zwitter Vitez, Simon Krek & Marko Stabej
Moneglia, Massimo
Moneglia, Massimo
2014. The variation of action verbs in multilingual spontaneous speech corpora. In Spoken Corpora and Linguistic Studies [Studies in Corpus Linguistics, 61], ► pp. 152 ff.
Green, Hila
Green, Hila & Yishai Tobin
Conrad, Susan M. & Kimberly R. LeVelle
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
