Article published In: Australian Review of Applied Linguistics
Vol. 49:1 (2026) ► pp.58–86
Compiling the first spoken corpus for Turkish youth talk
Overview of the corpus and methodological issues
Published online: 28 July 2025
https://doi.org/10.1075/aral.25007.efe
https://doi.org/10.1075/aral.25007.efe
Abstract
This paper addresses issues related to the design and compilation of the first spoken corpus of youth talk in an
under-represented language in corpus linguistics, Turkish. Designed to offer a maximally representative sample of Turkish youth
talk, the Corpus of Turkish Youth Language (CoTY) is a 168,748-token specialised corpus within the single register of informal,
naturally occurring and spontaneous interaction exclusively among friends. The speakers are Turkish-speaking youth aged 14 to 18
from diverse socio-economic backgrounds in Türkiye. In this paper, the issues that surfaced during corpus design and construction
are presented, with a discussion and justification of the methodological choices in relation to the long-term project objectives.
The corpus contributes to the field as a valuable resource and tool for cross-linguistic youth language research. As an
overarching fundamental goal, the project also aims to expand on the cumulative linguistic and methodological knowledge in spoken
corpus design and construction.
Keywords: spoken corpus, youth talk, corpus construction, corpus design, Turkish
Article outline
- 1.Introduction
- 2.Youth talk corpora
- 3.Compiling spoken data
- 3.1Defining the sample
- 3.2Recruiting the participants
- 3.3Collecting spoken data
- 4.Corpus building
- 5.The corpus of Turkish youth language (CoTY)
- 5.1Corpus description
- 5.2Utilising CoTY: Is there a distinct Turkish youth language?
- 6.Conclusion
- Ethics
- Declaration of AI use
- Notes
References
References (84)
Adolphs, S., & Knight, D. (2010). Building
a spoken corpus. In A. O’Keeffe & M. McCarthy (Eds.), The
Routledge handbook of corpus
linguistics (pp. 38–52). Routledge.
Aijmer, K. (2020). That’s
well good: A re-emergent intensifier in current British English. Journal of English
Linguistics, 49(1), 18–38.
Ancarno, C. (2020). Corpus-assisted
discourse studies. In A. de Fina & A. Georgakopoulou (Eds.), The
Cambridge Handbook of Discourse Studies. Cambridge University Press.
Andersen, G. (1997). They
like wanna see like how we talk and all that. The use of like as a discourse marker in London teenage
speech. In M. Ljung (Ed.), Corpus-based
studies in
English (pp. 37–48). Rodopi.
Androutsopoulos, J. (2007). Style
online: Doing hip-hop on the German-speaking Web. In P. Auer (Ed.), Style
and social identities: Alternative approaches to linguistic
heterogeneity (pp. 279–317). De Gruyter Mouton.
Baker, P. & Egbert, J. (Eds.) (2016). Triangulating
Methodological Approaches in Corpus Linguistic
Research. Routledge.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). The
Longman Grammar of Spoken and Written
English. Longman.
Cheshire, J., Kerswill, P., Fox, S., & Torgersen, E. (2011). Contact,
the feature pool and the speech community: The emergence of Multicultural London
English. Journal of
Sociolinguistics, 15(2), 151–96.
Dorleijn, M., Mous, M., & Nortier, J. (2015). Urban
youth styles in Kenya and the Netherlands. In J. Nortier & B. A. Svendsen (Eds.), Language,
youth and identity in the 21st Century: Linguistic practices across urban
spaces (pp. 271–89), Cambridge University Press.
Dovchin, S., Pennycook, A., & Sultana, S. (2018). Popular
culture, voice and linguistic diversity: Young adults on- and offline. Palgrave Macmillan.
Drange, E.-M. (2009). Anglicisms
in the informal speech of Norwegian and Chilean adolescents. In A.-B. Stenström, & A. M. Jørgensen (Eds.), Youngspeak
in a multilingual
perspective (pp. 161–75). John Benjamins.
Drummond, R. (2016). (Mis)interpreting
urban youth language: White kids sounding black? Journal of Youth
Studies, 20(5), 640–60.
(2018). Maybe
it’s a grime [t]ing. TH-stopping in urban British youth. Language in
Society, 47(2), 171–96.
(2020). Teenage
swearing in the UK. English
World-wide, 41(1), 59–88.
Enghels, R., De Latte, F., & Roels, L. (2020). El
Corpus Oral de Madrid (CORMA): Materiales para el estudio (socio)lingüístico del español coloquial
actual. Zeitschrift für
Katalanistik, 331, 45–76.
Flowerdew, L. (2008). Corpora
and context in professional writing. In V. Bhatia, J. Flowerdew & R. H. Jones (Eds.), Advances
in discourse
studies (pp. 115–127). Routledge.
Georgakopoulou, A. (2008). ‘On
MSN with buff boys’: Self- and other-identity claims in the context of small stories. Journal
of
Sociolinguistics, 12(5), 597–626.
Georgakopoulou, A., & Charalambidou, A. (2011). Doing
age and ageing- language, discourse and social interaction. In A. Aijmer & G. Andersen (Eds.), Pragmatics
of
Society (pp. 31–52). De Gruyter Mouton.
Goedertier, W., Goddijn, S., & Martens, J. (2000). Orthographic
transcription of the Spoken Dutch Corpus. Proceedings of the 2nd International Conference on
Language Resources &
Evaluation (pp. 909–14). European Language Resources Association.
Groff, C., Hollington, A., Hurst-Harosh, E., Nassenstein, N., Nortier, J., Pasch, H. & Yannuar, N. (2022). Global
Perspectives on Youth Language Practices. De Gruyter Mouton.
Harissi, M., Otsuji, E., & Pennycook, A. (2012). The
Performative Fixing and Unfixing of Subjectivities, Applied
Linguistics, 33(5), 524–543.
Hasund, I. K., & Stenström, A.-B. (1997). Conflict
talk: A comparison of the verbal disputes of adolescent females in two
corpora. In M. Ljung (Ed.), Corpus-based
studies in
English (pp. 119–33), Rodopi.
Ilbury, C. (2022a). Discourses
of social media amongst youth: An ethnographic perspective. Discourse, Context, and
Media, 481.
(2022b). U
Ok Hun?: The digital commodification of white woman style. Journal of
Sociolinguistics, 26(2), 159–164.
Jasanoff, S. S. (2003). Technologies
of humility: Citizen participation in governing
science. Minerva, 41(3), 223–44.
Jonsson, R. (2018). Swedes
can’t swear: Making fun at a multiethnic secondary school. Journal of Language, Identity &
Education, 17(5), 320–35.
(2013). Spanish
teenage language and the COLAm-corpus. Bergen Language and Linguistics
Studies, 3(1), 151–66.
Jufri, S. & Sun, C. (2022). Keywords
Analysis. v1.0. Australian Text Analytics Platform. Software. [URL]
Kerswill, P. & Williams, A. (2005). New
towns and koineization: linguistic and social
correlates. Linguistics, 43(5), 1023–1048.
Kilgarriff, A., Rundell, M., & Dhonnchadha, E. U. (2006). Efficient
Corpus Development for Lexicography: Building the New Corpus for Ireland. Language Resources
and
Evaluation, 40(2), 127–152. [URL]
Kotsinas, U.-B. (1988). Immigrant
children’s Swedish: A new variety? Journal of Multilingual and Multicultural
Development, 91, 129–140.
Love, R., & Stenström, A-B. (2023). Corpus-pragmatic
perspectives on the contemporary weakening of fuck: The case of teenage British English
conversation. Journal of
Pragmatics, 2161, 167–181.
Love, R., Dembry, C., Hardie, A., Brezina, V. & McEnery, T. (2017). The
Spoken BNC2014: designing and building a spoken corpus of everyday conversations. International
Journal of Corpus
Linguistics, 22(3), 319–344.
MacWhinney, B., & Snow, C. (1985). The
child language data exchange system. Journal of Child
Language (12)21, 271–269.
Madsen, L. M. (2015). Fighters,
girls and other identities: Sociolinguistics in a martial arts club. Multilingual Matters.
Marchi, A. & Taylor, C. (2018). Introduction. In C. Taylor and A. Marchi (Eds.) Corpus
Approaches to Discourse: A Critical
Review (pp. 1–15). Routledge.
Moore, E. (2004). Sociolinguistic
Style: A Multidimensional Resource for Shared Identity Creation. The Canadian Journal of
Linguistics / La Revue Canadienne de
Linguistique, 49(3), 375–396.
Nørreby, T. R., & Møller, J. S. (2015). Ethnicity
and social categorization in on- and offline interaction among Copenhagen
adolescents. Discourse, Context &
Media, 81, 46–54.
Nortier, J. (2016). Characterizing
urban youth speech styles in Utrecht and on the Internet. Journal of Language
Contact, 9(1), 163–85.
Nortier, J. & Svendsen, B. (Eds.). (2015). Language,
Youth and Identity in the 21st Century: Linguistic Practices across Urban Spaces. Cambridge University Press.
Palacios Martínez, I. M. (2018). ‘Help
me move to that, blood’. A corpus-based study of the syntax and pragmatics of vocatives in the language of British
teenagers. Journal of
Pragmatics, 1301, 33–50.
Palacios Martínez, I. M., & Núñez Pertejo, P. (2014). Strategies
used by English and Spanish teenagers to intensify language: A contrastive corpus-based
study. Spanish in
Context, 11(2), 175–201.
Partington, A. (2004). Corpora
and discourse, a most congruous beast. In A. Partington, J. Morley & L. Haarman (Eds.) Corpora
and
Discourse (pp. 11–20). Peter Lang.
Pharao, N., Maegaard, M., Møller, J. S., & Kristiansen, T. (2014). Indexical
meanings of [s+] among Copenhagen youth: Social perception of a phonetic variant in different prosodic
contexts. Language in
Society, 43(1), 1–31.
Quist, P., & Svendsen, B. A. (2010). Multilingual
Urban Scandinavia: New linguistic practices. Multilingual Matters.
(2011). From
‘Multi-ethnic adolescent heteroglossia’ to ‘Contemporary urban vernaculars’. Language &
Communication, 31(4), 276–94.
(2015). Contemporary
urban vernaculars. In J. Nortier, & B. A. Svendsen (Eds.), Language,
youth and identity in the 21st century. Linguistic practices across urban
spaces (pp. 25–44). Cambridge University Press.
Rehbein, I., Schalowski, S., & Wiese, H. (2014). The
KiezDeutsch Korpus (KiDKo) Release 1.0. In Proceedings of the 9th
International Conference on Language Resources and
Evaluation (pp. 367–375). European Language Resources Association.
Rehbein, J., Schmidt, T., Meyer, B., Watzke, F. & Herkenrath, A. (2004). Handbuch
für das computergestützte Transkribieren nach HIAT. In: Arbeiten zur
Mehrsprachigkeit, Folge B (56). [URL]
Rodríguez-González, F., & Stenström, A.-B. (2011). Expressive
devices in the language of English and Spanish-speaking youth. Revista Alicantina de Estudios
Ingleses, 241, 235–56.
Roels, L., & Enghels, R. (2020). Age-based
variation and patterns of recent language change: A case-study of morphological and lexical intensifiers in
Spanish. Journal of
Pragmatics, 1701, 125–38.
Roels, L., De Latte, F., & Enghels, R. (2021). Monitoring
21st-Century real-time language change in Spanish youth
speech. Languages, 6(4), 162.
Ruhi, Ş. (2013). Interactional
markers in Turkish: A corpus based perspective. Journal of Linguistics and
Literature, (10)21, 1–7.
Ruhi, Ş., Hatipoğlu, Ç., Işık-Güler, H., & Eröz-Tuğa, B. (2010a). A
guideline for transcribing conversations for the construction of Spoken Turkish Corpora using EXMARaLDA and
HIAT. Setmer Yayıncılık.
Ruhi, Ş., Hatipoğlu, Ç., Eröz-Tuğa, B., Işık-Güler, H., Acar, G., Eryılmaz, K., Can, H., Karakaş, Ö. and Çokal-Karadaş, D. (2010b, May). Sustaining
a Corpus for Spoken Turkish Discourse: Accessibility and Corpus Management Issues. [Paper
presentation] Language Resources: From Storyboard to Sustainability and LR Lifecycle Management
Workshop, Malta.
Rymes, B., & Leone, A. R. (2014). Citizen
Sociolinguistics: A new media methodology for understanding language and social life. Working
Papers in Educational
Linguistics, 29(2), 25–43.
Schmidt, T., & Wörner, K. (2014). EXMARaLDA. In T. Schmidt (Ed.), Handbook
on Corpus
Phonology (pp. 402–19). Oxford University Press.
Schneider, C., Brittan, D., Hodel, T., Hess, D. & Linder, A. (2021). JuBe
— Jugendsprache Schweiz Korpus (1.0) [Data set]. Zenodo.
Selvi, A. F. (2011). World
Englishes in the Turkish sociolinguistic context. World
Englishes, 30(2), 182–199.
Shirk, J. L., Ballard, H. L., Wilderman, C. C., Phillips, T., Wiggins, A., Jordan, R., McCallie, E., Minarchek, M., Lewenstein, B. V., Krasny, M. E., & Bonney, R. (2012). Public
participation in scientific research: A framework for deliberate design. Ecology and
Society, 17(2).
Sinclair, J. (1996). Corpus
typology: Guidelines for encoding and documentation of linguistic corpora. [URL]
Steingrímsson, S., Helgadóttir, S., Rögnvaldsson, K., Barkarson, S., & Guðnason, J. (2018). Risamálheild:
A Very Large Icelandic Text Corpus. In Proceedings of the 11th
International Conference on Language Resources and
Evaluation (pp. 4361–66). European Language Resources Association.
Stenström, A.-B., Andersen, G., & Hasund, I. K. (2002). Trends
in teenage talk. John Benjamins Publishing Company.
Stenström, A.-B. (1997). Can
I have a chips please? — Just tell me what one you want: Nonstandard grammatical features in London teenage
talk. In J. Aarts (Ed.), Studies
in English language and
teaching (pp. 141–52). Rodopi.
(1998). From
sentence to discourse: Cos (because) in teenage talk. In A. Jucker & Y. Ziv (Eds.), Discourse
markers: Descriptions and
theory (pp. 127–46). John Benjamins.
(2005). It
is very good eh– Está muy bien eh. Teenagers’ use of tags — London and Madrid
compared. In K. McCafferty, Tove Bull, & K. Killie (Eds.), Contexts
— historical, social, linguistic. Studies in celebration of Toril
Swan (pp. 279–91). Peter Lang.
(2014). Teenage
talk: From general characteristics to the use of pragmatic markers in a contrastive
perspective. Palgrave Macmillan.
Stenström, A.-B., & Jørgensen, A. M. (2008). A
question of politeness? A contrastive study of phatic language in teenage
conversation. Pragmatics, 18(4), 636–57.
Stenström, A.-B., Andersen, G., & Hasund, I.-K. (2002). Trends
in Teenage Talk: Corpus compilation, analysis and findings. John Benjamins.
Stenström, A.-B., Andersen, G., Hasund, K., Monstad, K., & Aas, H. (1998). User’s
Manual to Accompany The Bergen Corpus of London Teenage Language (COLT). University of Bergen.
Strenström, A.-B. (2013). Youngspeak:
Spanish vale and English okay. In K. Aijmer & B. Altenberg (Eds.), Advances
in Corpus-based Contrastive
Linguistics (pp. 127–139). John Benjamins.
Svendsen, B. A. (2018). The
dynamics of citizen sociolinguistics. Journal of
Sociolinguistics, 22(2), 137–60.
Svendsen, B. A., & Røyneland, U. (2008). Multiethnolectal
facts and functions in Oslo, Norway. International Journal of
Bilingualism, 12(1–2), 63–83.
TurkStat (2023). Youth in
Statistics. Retrieved April 1,
2024. [URL]
