Article published In: Language and Linguistics
Vol. 19:4 (2018) ► pp.525–548
Identifying lexical bundles in Chinese
Methodological issues and an exploratory data analysis
Available under the Creative Commons Attribution (CC BY) 4.0 license.
For any use beyond this license, please contact the publisher at rights@benjamins.nl.
Published online: 10 October 2018
https://doi.org/10.1075/lali.00019.hsu
https://doi.org/10.1075/lali.00019.hsu
Abstract
Recurrent word sequences, referred to as “lexical bundles”, may be structurally incomplete, but they serve important communicative functions. Despite the essential roles of lexical bundles in discourse, many methodological issues have been raised in the process of identifying lexical bundles, which is generally frequency-based. The present study identifies three-word and four-word bundles in Chinese conversation and news, and efforts are made to respond to methodological challenges encountered in previous studies. We employ a more sensitive dispersion measure, DP, and an internal association measure, G, which help filter out high-frequency word sequences with no identifiable function and reduce the workload of further manual interventions. An exploratory data analysis is then conducted to compare the distributional patterns of lexical bundles in Chinese conversation and news. In Chinese, both the type number and the density of lexical bundles are higher in conversation than in news. This appears to be a strong cross-linguistic tendency that reflects the real-time pressure speakers face in spontaneous speech. The exploratory data analysis also shows that the elements in Chinese bundles are closely associated with each other. This suggests that lexical bundles are useful phrasal units in Chinese discourse, and thus invites further investigations of how lexical bundles are used in Chinese.
Article outline
- 1.Introduction
- 2.Methodological issues in identifying lexical bundles
- 2.1Issues relating to the corpus
- 2.2Issues relating to the length of lexical bundles
- 2.3Issues relating to the quantitative criteria
- 2.4Issues relating to manual interventions
- 2.5An interim summary
- 3.Identifying lexical bundles in Chinese
- 3.1Extracting high-frequency word sequences
- 3.2Dispersion thresholds
- 3.3Association threshold
- 3.4Other methodological issues and practical solutions
- 4.Results and discussion
- 5.Conclusion
- Acknowledgements
- Notes
References
References (46)
Altenberg, Bengt & Eeg-Olofsson, Mats. 1990. Phraseology in spoken English: Presentation of a project. In Aarts, Jan & Meijs, Willem (eds.), Theory and practice in corpus linguistics, 1–26. Amsterdam: Rodopi.
Biber, Douglas. 2009. A corpus-driven approach to formulaic language in English: Multi-word patterns in speech and writing. International Journal of Corpus Linguistics 14(3). 275–311.
Biber, Douglas & Barbieri, Federica. 2007. Lexical bundles in university spoken and written registers. English for Specific Purposes 26(3). 263–286.
Biber, Douglas & Conrad, Susan & Cortes, Viviana. 2004. If you look at…: Lexical bundles in university teaching and textbooks. Applied Linguistics 25(3). 371–405.
Biber, Douglas & Johansson, Stig & Leech, Geoffrey & Conrad, Susan & Finegan, Edward. 1999. Longman grammar of spoken and written English. London: Longman.
Butler, Christopher S. 1997. Repeated word combinations in spoken and written text: Some implications for functional grammar. In Butler, Christopher S. & Connolly, John H. & Gatward, Richard A. & Vismans, Roel M. (eds.), A fund of ideas: Recent developments in functional grammar (Studies in Language and Language Use 31), 60–77. Amsterdam: IFOTT.
Bybee, Joan. 2007. Frequency of use and the organization of language. Oxford: Oxford University Press.
Carroll, John B. 1970. An alternative to Juilland’s usage coefficient for lexical frequencies and a proposal for a standard frequency index (SFI). Computer Studies in the Humanities and Verbal Behavior 3(2). 61–65.
Chen, Lin. 2010. An investigation of lexical bundles in ESP textbooks and electrical engineering introductory textbooks. In Wood, David (ed.), Perspectives on formulaic language: Acquisition and communication, 107–125. London: Continuum.
Chen, Yu-Hua & Baker, Paul. 2010. Lexical bundles in L1 and L2 academic writing. Language Learning & Technology 14(2). 30–49.
Conklin, Kathy & Schmitt, Norbert. 2008. Formulaic sequences: Are they processed more quickly than nonformulaic language by native and nonnative speakers? Applied Linguistics 29(1). 72–89.
Conrad, Susan & Biber, Douglas. 2004. The frequency and use of lexical bundles in conversation and academic prose. Lexicographica 201. 56–71.
Cortes, Viviana. 2002. Lexical bundles in freshman composition. In Reppen, Randi & Fitzmaurice, Susan M. & Biber, Douglas (eds.), Using corpora to explore linguistic variation (Studies in Corpus Linguistics 9), 131–145. Amsterdam: John Benjamins.
. 2004. Lexical bundles in published and student disciplinary writing: Examples from history and biology. English for Specific Purposes 23(4). 397–423.
. 2008. A comparative analysis of lexical bundles in academic history writing in English and Spanish. Corpora 3(1). 43–57.
Cortes, Viviana & Csomay, Eniko. 2007. Positioning lexical bundles in university lectures. In Campoy, Mari Carmen & Luzón, María José (eds.), Spoken corpora in applied linguistics (Linguistic Insights 51), 57–76. Frankfurt am Main: Peter Lang.
Culpeper, Jonathan & Kytö, Merja. 2002. Lexical bundles in Early Modern English dialogues: A window into the speech-related language of the past. In Fanego, Teresa & Méndez-Naya, Belén & Seoane, Elena (eds.), Sounds, words, texts, and change, vol. 21 (Current Issues in Linguistic Theory 224), 45–63. Amsterdam: John Benjamins.
De Cock, Sylvie. 1998. A recurrent word combination approach to the study of formulae in the speech of native and non-native speakers of English. International Journal of Corpus Linguistics 3(1). 59–80.
Gries, Stefan Th. 2008. Dispersion and adjusted frequencies in corpora. International Journal of Corpus Linguistics 13(4). 403–437.
Hyland, Ken. 2008. As can be seen: Lexical bundles and disciplinary variation. English for Specific Purposes 27(1). 4–21.
Institute of Information Science & CKIP Group in Academia Sinica. 2013. Academia Sinica Balanced Corpus of Modern Chinese. 4th edn. ([URL]) (Accessed 2016-10-04.)
Jiang, Nan & Nekrasova, Tatiana M. 2007. The processing of formulaic sequences by second language speakers. The Modern Language Journal 91(3). 433–445.
Kim, YouJin. 2009. Korean lexical bundles in conversation and academic texts. Corpora 4(2). 135–165.
Kopaczyk, Joanna. 2012. Applications of the lexical bundles method in historical corpus research. In Pęzik, Piotr (ed.), Corpus data across languages and disciplines (Łódź Studies in Language 28), 83–95. Frankfurt am Main: Peter Lang.
Leńko-Szymańska, Agnieszka. 2014. The acquisition of formulaic language by EFL learners: A cross-sectional and cross-linguistic perspective. International Journal of Corpus Linguistics 19(2). 225–251.
Li, Charles N. & Thompson, Sandra A. 1981. Mandarin Chinese: A functional reference grammar. Berkeley: University of California Press.
McEnery, Tony & Xiao, Richard & Tono, Yukio. 2006. Corpus-based language studies: An advanced resource book (Routledge Applied Linguistics). London: Routledge.
Nesi, Hilary & Basturkmen, Helen. 2006. Lexical bundles and discourse signalling in academic lectures. International Journal of Corpus Linguistics 11(3). 283–304.
O’Keeffe, Anne & McCarthy, Michael & Carter, Ronald. 2007. From corpus to classroom: Language use and language teaching. Cambridge: Cambridge University Press.
Partington, Alan & Morley, John. 2004. At the heart of ideology: Word and cluster/bundle frequency in political debate. In Lewandowska-Tomaszczyk, Barbara (ed.), Practical applications in language and computers: PALC 2003 (Łódź Studies in Language 9), 179–192. Frankfurt am Main: Peter Lang.
Pawley, Andrew & Syder, Frances Hodgetts. 1983. Two puzzles for linguistic theory: Nativelike selection and nativelike fluency. In Richards, Jack C. & Schmidt, Richard W. (eds.), Language and communication, 191–226. London: Longman.
Salazar, Danica. 2014. Lexical bundles in native and non-native scientific writing: Applying a corpus-based study to language teaching (Studies in Corpus Linguistics 65). Amsterdam: John Benjamins.
Simpson-Vlach, Rita & Ellis, Nick C. 2010. An academic formulas list: New methods in phraseology research. Applied Linguistics 31(4). 487–512.
Stubbs, Michael. 2007. Quantitative data on multi-word sequences in English: The case of the word world. In Hoey, Michael & Mahlberg, Michaela & Stubbs, Michael & Teubert, Wolfgang (eds.), Text, discourse and corpora: Theory and analysis, 163–189. London: Continuum.
Tannen, Deborah. 1982. Oral and literate strategies in spoken and written narratives. Language 58(1). 1–21.
Tao, Hongyin. 2015. Profiling the Mandarin spoken vocabulary based on corpora. In Wang, William S-Y. & Sun, Chaofen (eds.), The Oxford handbook of Chinese linguistics, 336–347. Oxford: Oxford University Press.
Tracy-Ventura, Nicole & Cortes, Viviana & Biber, Douglas. 2007. Lexical bundles in speech and writing. In Parodi, Giovanni (ed.), Working with Spanish corpora (Research in Corpus and Discourse), 217–231. London: Continuum.
Tremblay, Antoine & Derwing, Bruce & Libben, Gary. 2009. Are lexical bundles stored and processed as single units? Working Papers of the Linguistics Circle of the University of Victoria 191. 258–279.
Wei, Naixing & Li, Jingjie. 2013. A new computing method for extracting contiguous phraseological sequences from academic text corpora. International Journal of Corpus Linguistics 18(4). 506–535.
Wood, David. 2010. Lexical clusters in an EAP textbook corpus. In Wood, David (ed.), Perspectives on formulaic language: Acquisition and communication, 88–106. London: Continuum.
Xu, Jiajin. 2015. Corpus-based Chinese studies: A historical review from the 1920s to the present. Chinese Language and Discourse 6(2). 218–244.
