In:Multiple Affordances of Language Corpora for Data-driven Learning
Edited by Agnieszka Leńko-Szymańska and Alex Boulton
[Studies in Corpus Linguistics 69] 2015
► pp. 65–84
Learning phraseology from speech corpora
Published online: 13 May 2015
https://doi.org/10.1075/scl.69.04ast
https://doi.org/10.1075/scl.69.04ast
There is substantial evidence that phraseology is key to fluency in speech production and reception, particularly in cognitively/affectively demanding contexts like interpreting. Yet most second language speakers have limited repertoires of phraseological items, lacking knowledge of their lexicogrammatical, functional, and also of their prosodic aspects. Speech corpora which align transcripts with audio can readily be constructed from subtitled video materials, and learners can use these to view and hear concordanced data. Examples are provided for phraseological items documented in a one million-word corpus of talks from the TED – Ideas worth spreading site (www.ted.com), analysed using WordSmith Tools (Scott 2012). Activities for performance with and by learners are also suggested, aimed at increasing their phraseological awareness and expanding their repertoires.
Keywords: interpreting, phraseology, speech corpora, subtitles, TED talks
References (63)
Arnon, I. & Snider, N. 2010. More than words: Frequency effects for multi-word phrases. Journal of Memory and Language 62(1): 67–82.
Aston, G. 2001. Corpora in language pedagogy: An overview. In Learning with Corpora, G. Aston (ed.), 7–45. Houston TX: Athelstan.
Bardovi–Harlig, K. & Vellenga, H. 2012. The effect of instruction on conventional expressions in L2 pragmatics. System 40(1): 77–89.
Biber, D. 2006. University Language: A Corpus-Based Study of Spoken and Written Registers [Studies in Corpus Linguistics 23]. Amsterdam: John Benjamins.
. 2009. A corpus-driven approach to formulaic language in English. International Journal of Corpus Linguistics 14(3): 275–311.
Biber, D. & Conrad, S. 1999. Lexical bundles in conversation and academic prose. In Out of Corpora, S. Oksefjell & H. Hasselgard (eds), 181–189. Amsterdam: Rodopi.
Biber, D., Conrad, S. & Cortes, V. 2004. ‘If you look at…’: Lexical bundles in university teaching and textbooks. Applied Linguistics 25(3): 371–405.
Biber, D., Johansson, S., Leech, G., Conrad, S. & Finegan, E. 1999. Longman Grammar of Spoken and Written English. Harlow: Pearson Education.
Bybee, J. 2002. Phonological evidence for exemplar storage of multiword sequences. Studies in Second Language Acquisition 24(2): 215–221.
Chafe, W. 1988. Punctuation and the prosody of written language. Written Communication 5(4): 395–426.
Coleman, J. 2012. Mining a year of speech. Presentation at TGE Adonis, Orleans, 5 April. <[URL]> (8 May 2014).
Conklin, K. & Schmitt, N. 2008. Formulaic sequences: Are they processed more quickly than nonformulaic language by native and nonnative speakers? Applied Linguistics 29(1): 72–89.
Coxhead, A. & Walls, R. 2012. TED talks, vocabulary, and listening for EAP. TESOLANZ Journal 20(1): 55–67.
Crossley, S. & Salsbury, T. 2011. The development of lexical bundle accuracy and production in English second language speakers. International Review of Applied Linguistics in Language Teaching 49(1): 1–26.
Ellis, N. 2012. Formulaic language and second language acquisition: Zipf and the phrasal teddy bear. Annual Review of Applied Linguistics 32: 17–44.
Erman, B. 2007. Cognitive processes as evidence of the idiom principle. International Journal of Corpus Linguistics 12(1): 25–53.
Forster, P. 2001. Rules and routines: A consideration of their role in the task-based language production of native and non-native speakers. In Researching Pedagogic Tasks: Second Language Learning, Teaching and Testing, M. Bygate, P. Skehan & M. Swain (eds), 75–93. London: Longman.
Gavioli, L. & Aston, G. 2001. Enriching reality: Language corpora in language pedagogy. ELT Journal 55(3): 238–246.
Gile, D. 1997. Conference interpreting as a cognitive management problem. In Cognitive Processes in Translation and Interpreting, J. Danks, G. Shreve, S. Fountain & M. McBeath (eds), 196–214. London: Sage.
Glavitsch, U., Simon, K. & Szakos, J. 2011. SpeechIndexer: A flexible software for audio–visual language learning. In Proceedings of the International Conference on Education, Informatics, and Cybernetics; International Symposium on Integrating Research, Education, and Problem-Solving (post-conference edition), N. Callaos, H.-W. Chu, J. Horne & F. Welsch (eds), 91–94. <[URL]> (8 May, 2014).
Gray, B. & Biber, D. 2013. Lexical frames in academic prose and conversation. International Journal of Corpus Linguistics 18(1): 109–135.
Hasselgren, A. 1994. Lexical teddy bears and advanced learners: A study into the ways Norwegian students cope with English vocabulary. International Journal of Applied Linguistics 4(2): 237–258.
Henriksen, L. 2007. The song in the booth: Formulaic interpreting and oral textualisation. Interpreting 9(1): 1–20.
Kuiper, K. 1996. Smooth Talkers: The Linguistic Performance of Auctioneers and Sportscasters. London: Routledge.
. 2004. Formulaic performance in conventionalised varieties of speech. In Formulaic Sequences: Acquisition, Processing and Use [Language Learning and Language Teaching 9], N. Schmitt (ed.), 37–54. Amsterdam: John Benjamins.
Kuiper, K., Columbus, G. & Schmitt, N. 2009. Acquiring phrasal vocabulary. In Advances in Language Acquisition, S. Foster-Cohen (ed.), 216–240. Basingstoke: Palgrave Macmillan.
Lin, P. 2010a. The phonology of formulaic sequences: A review. In Perspectives on Formulaic Language: Acquisition and Communication, D. Wood (ed.), 174–193. London: Continuum.
. 2010b. The Prosody of Formulaic Language. PhD dissertation, University of Nottingham.
. 2012. Sound evidence: The missing piece of the jigsaw in formulaic language research. Applied Linguistics 33(3): 342–347.
. 2013. The prosody of formulaic expressions in the IBM/Lancaster Spoken English Corpus. International Journal of Corpus Linguistics 18(4): 561–588.
Martin, P. 2011. WinPitch: A multimodal tool for speech analysis of endangered languages. In Interspeech-2011, 3273–3276.
Mel’cuk, I. 1998. Collocations and lexical functions. In Phraseology, A. Cowie (ed.), 23–53. Oxford: OUP.
Millar, N. 2011. The processing of malformed formulaic language. Applied Linguistics 32(2): 129–148.
Nesi, H. & Basturkmen, H. 2006. Lexical bundles and discourse signaling in academic lectures. International Journal of Corpus Linguistics 11(3): 283–304.
O’Donnell, M.B., Römer, U. & Ellis, N. 2013. The development of formulaic sequences in first and second language writing: Investigating effects of frequency, association, and native norm. International Journal of Corpus Linguistics 18(1): 83–108.
Pawley, A. & Syder, F. 1983. Two puzzles for linguistic theory: Nativelike selection and nativelike fluency. In Language and Communication, J. Richards & R. Schmidt (eds), 191–225. London: Longman.
. 2000. The one clause at a time hypothesis. In Perspectives on Fluency, H. Riggenbach (ed.), 163–191. Ann Arbor MI: University of Michigan Press.
Pigada, M. & Schmitt, N. 2006. Vocabulary acquisition from extensive reading: A case study. Reading in a Foreign Language 18(1): 1–28.
Raupach, M. 1984. Formulae in second language speech production. In Second Language Production, H. Dechert, D. Möhle & M. Raupach (eds), 114–137. Tübingen: Gunter Narr.
Römer, U. 2011. Observations on the phraseology of academic writing: Local patterns – local meanings? In The Phraseological View of Language: A Tribute to John Sinclair, T. Herbst, S. Faulhaber & P. Uhrig (eds), 211–227. Berlin: Mouton De Gruyter.
Simpson-Vlach, R. & Ellis, N. 2010. An academic formulas list: New methods in phraseology research. Applied Linguistics 31(4): 487–512.
Sinclair, J. McH. 1987. Collocation: A progress report. In Language Topics, II: Essays in Honour of Michael Halliday, R. Steele & T. Threadgold (eds), 319–332. Amsterdam: John Benjamins.
. 2004. Interview with John Sinclair, conducted by Wolfgang Teubert. In English Collocation Studies: The OSTI report, R. Krishnamurthy, R. Daley, S. Jones & J. Sinclair (eds), xvii–xxix. London: Continuum.
. 2008. The phrase, the whole phrase and nothing but the phrase. In Phraseology: An Interdisciplinary Perspective, S. Granger & F. Meunier (eds), 407–410. Amsterdam: John Benjamins.
Strik, H., Hulsbosch, M. & Cucchiarini, C. 2010. Analyzing and identifying multiword expressions in spoken language. Language Resources and Evaluation 44(1-2): 41–58.
Tavakoli, P. 2011. Pausing patterns: Differences between L2 learners and native speakers. ELT Journal 65(1): 71–79.
Vogel Sosa, A. & MacFarlane, J. 2002. Evidence for frequency-based constituents in the mental lexicon: Collocations involving the word ‘of’. Brain and Language 83(2): 227–236.
Walczyk, J., Griffith, D., Yates, R., Visconte, S., Simoneaux, B. & Harris, L. 2012. Lie detection by inducing cognitive load: Eye movements and other cues to the false answers of ‘witnesses’ to crimes. Criminal Justice and Behavior 39(7): 887–909.
Cited by (9)
Cited by nine other publications
Cacciato, Alessandra & Jarvis Looi
Boers, Frank, Thuy Bui, Julie Deconinck, Hélène Stengers & Averil Coxhead
Gaber, Mahmoud
Wu, Yinyin
Wu, Yinyin
Corpas Pastor, Gloria
Corpas Pastor, Gloria & Fernando Sánchez Rodas
2021.
Now what?. In Corpora in Translation and Contrastive Research in the Digital Age [Benjamins Translation Library, 158], ► pp. 23 ff.
Bui, Thuy, Frank Boers & Averil Coxhead
2020. Extracting multiword expressions from texts with the aid of online resources. ITL - International Journal of Applied Linguistics 171:2 ► pp. 221 ff.
This list is based on CrossRef data as of 1 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
