Article published In: International Journal of Corpus Linguistics
Vol. 25:2 (2020) ► pp.216–230
short paper
Methodological issues in contrastive lexical bundle research
The influence of corpus design on bundle identification
Published online: 28 August 2020
https://doi.org/10.1075/ijcl.19063.pan
https://doi.org/10.1075/ijcl.19063.pan
Abstract
This study explores the influence of corpus design when comparing lexical bundle use across groups, examining how the number of texts and average length of texts can impact conclusions about group differences. The study compares the use of lexical bundles by L1-English versus L2-English writers, based on analysis of two sub-corpora of academic articles that are matched for discipline, writer expertize, time of publication, and audience. However, the two sub-corpora differ with respect to the number of texts and the average length of texts. Three experiments examined the influence of differences in corpus composition. The results show that differences in the number of words and number of texts across sub-corpora can have a strong effect on claimed differences in bundle use across groups. This effect is found even when the texts in the corpora are closely matched for their register and topic.
Article outline
- 1.Introduction
- 2.Methodology
- 2.1Corpora
- 2.2Overview of the experiments
- 3.Results and discussion
- 4.Conclusions
- Acknowledgements
References
References (20)
Ädel, A., & Erman, B. (2012). Recurrent word combinations in academic writing by native and non-native speakers of English: A lexical bundles approach. English for Specific Purposes, 31(2), 81–92.
Altenberg, B. (1998). On the phraseology of spoken English: The evidence of recurrent word-combinations. In A. Cowie (Ed.), Phraseology: Theory, Analysis and Applications (pp. 101–122). Oxford University Press.
Biber, D. (2006). University Language: A Corpus-based Study of Spoken and Written Registers. John Benjamins.
Biber, D., Conrad, S., & Cortes, V. (2004). If you look at…: Lexical bundles in university teaching and textbooks. Applied Linguistics, 25(3), 371–405.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman Grammar of Spoken and Written English. Pearson.
Chen, Y.-H., & Baker, P. (2010). Lexical bundles in L1 and L2 academic writing. Language Learning and Technology, 14(2), 30–49.
Ellis, N. C., & Simpson-Vlach, R. (2009). Formulaic language in native speakers: Triangulating psycholinguistics, corpus linguistics, and education. Corpus Linguistics and Linguistic Theory, 51, 61–78.
Ellis, N. C., Simpson-Vlach, R., & Maynard, C. (2008). Formulaic language in native and second-language speakers: Psycholinguistics, corpus linguistics, and TESOL. TESOL Quarterly, 41(3), 375–396.
Granger, S., & Paquot, M. (2008). Disentangling the phraseological web. In S. Granger & F. Meunier (Eds.), Phraseology: An Interdisciplinary Perspective. John Benjamins.
Lu, X., Kisselev, O., Yoon, J., & Amory, M. (2018). Investigating effects of criterial consistency, the diversity dimension, and threshold variation in formulaic language research: Extending the methodological considerations of O’Donnell et al. (2013). International Journal of Corpus Linguistics, 23(2), 158–182.
Mahlberg, M., Wiegand, V., Stockwell, P., & Hennessey, A. (2019). Speech bundles in the 19th-century English novel. Language and Literature, 28(4), 326–353.
Miller, D., & Biber, D. (2015). Evaluating reliability in quantitative vocabulary studies: The influence of corpus design and composition. International Journal of Corpus Linguistics, 20(1), 30–53.
O’Donnell, M., Römer, U., &. Ellis, N. C. (2013). The development of formulaic sequences in first and second language writing: Investigating effects of frequency, association, and native norm. International Journal of Corpus Linguistics, 18(1): 83–108.
Pan, F., Reppen, R., & Biber, D. (2016). Comparing patterns of L1 versus L2 English academic professionals: Lexical bundles in Telecommunications research journals. Journal of English for Academic Purposes, 211, 60–71.
Schmitt, N. (2004). Formulaic Sequences: Acquisition, Processing, and Use. John Benjamins.
Scott, M. (2015). WordSmith Tools (Version 6.0) [Computer software]. Lexical Analysis Software. [URL]
Simpson-Vlach, R., & Ellis, N. C. (2010). An academic formulas list (AFL). Applied Linguistics, 31(4), 487–512.
Cited by (16)
Cited by 16 other publications
Dahunsi, Toyese Najeem & Thompson Olusegun Ewata
Geluso, Joe, Hui-Hsien Feng & Randy Appel
Kaya, Ömer Faruk
Appel, Randy, Joe Geluso & Hui-Hsien Feng
Samraj, Betty
Appel, Randy, Angel Arias, Beverly Baker & Guillaume Loignon
Cui, Xuanjun & Yoonjung Kim
Liu, Xia, Shuangling LI, Wenzhang Fan & Qimeng Dang
Yang, Yiying & Fan Pan
Appel, Randy
Bao, Kai & Meihua Liu
Larsson, Tove, Shelley Staples & Jesse Egbert
Wang, Ying & Henrik Kaatari
Wang, Ying & Josep Soler
Yin, Xiaoyi & Shuangling Li
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
