Article published In: International Journal of Corpus Linguistics
Vol. 23:2 (2018) ► pp.158–182
Investigating effects of criterial consistency, the diversity dimension, and threshold variation in formulaic language research
Extending the methodological considerations of O’Donnell et al. (2013)
Published online: 5 October 2018
https://doi.org/10.1075/ijcl.16086.lu
https://doi.org/10.1075/ijcl.16086.lu
Abstract
O’Donnell et al. (2013) considered four measures of formulaicity and reported that they produced different results concerning the effects of expertise and first/second language status on formulaic sequence usage in academic writing. The current study explores several additional methodological issues using the same dataset from O’Donnell et al. (2013). We first motivate the need for criterial consistency and investigate whether frequency- and association-based measures yield different results when they are both obtained using corpus-internal criteria. The informativeness of the diversity dimension of formulaic sequence use is then gauged by comparing the results of phrase-frame type-token ratio against those of other measures. Finally, we profile formulaic sequence distribution across quartiles of different measures to assess the effect of variable measure thresholds. Our findings highlight the criticality of issues of criterial consistency, formulaic sequence diversity, and threshold variation in formulaic language research.
Article outline
- 1.Introduction
- 2.Methodological issues in formulaic sequence identification and extraction
- 3.Motivation for the current study
- 4.Method
- 4.1Data
- 4.2Measures
- 4.2.1N-gram frequency
- 4.2.2N-gram MI
- 4.2.3P-frame frequency
- 4.2.4P-frame TTR
- 4.3Procedure
- 5.Results
- 5.1Research question 1: Corpus-internal vs. corpus-external MI thresholds
- 5.2Research questions 2 and 3: Effects of expertise
- 5.2.1Frequency-based n-grams
- 5.2.2MI-defined formulas
- 5.2.3P-frames
- 5.2.4P-frame TTR
- 5.3Research questions 2 and 3: Effects of L1/L2 status
- 5.3.1Frequency-based n-grams
- 5.3.2MI-defined formulas
- 5.3.3P-frames
- 5.3.4P-frame TTR
- 6.Discussion
- 7.Conclusions
- Acknowledgements
- Notes
References
References (43)
Bannard, C., & Lieven, E. (2012). Formulaic language in L1 acquisition. Annual Review of Applied Linguistics, 321, 3–16.
Biber, D. (2006). University Language: A Corpus-Based Study of Spoken and Written Registers. Amsterdam/Philadelphia: John Benjamins.
(2009). A corpus-driven approach to formulaic language in English: Multi-word patterns in speech and writing. International Journal of Corpus Linguistics, 14(3), 275–311.
Biber, D., Conrad, S., & Cortes, V. (2004).
If you look at …: Lexical bundles in university teaching and textbooks. Applied Linguistics, 25(3), 371–405.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). The Longman Grammar of Spoken and Written English. New York/London: Longman.
Conklin, K., & Schmitt, N. (2012). The processing of formulaic language. Annual Review of Applied Linguistics, 321, 45–61.
Cortes, V. (2004). Lexical bundles in published and student disciplinary writing: Examples from history and biology. English for Specific Purposes, 23(4), 397–423.
Durrant, P., & Doherty, A. (2010). Are high-frequency collocations psychologically real? Investigating the thesis of collocational priming. Corpus Linguistics and Linguistic Theory, 6(2), 125–155.
Ellis, N. C. (2012). Formulaic language and Second Language Acquisition: Zipf and the phrasal Teddy Bear. Annual Review of Applied Linguistics, 321, 17–44.
Eskildsen, S. W. (2009). Constructing another language – Usage-based linguistics in second language acquisition. Applied Linguistics, 30(3), 335–357.
Eskildsen, S. W., & Cadierno, T. (2007). Are recurring multi-word expressions really syntactic freezes? Second language acquisition from the perspective of usage-based linguistics. In M. Nenonen & S. Niemi (Eds.), Collocations and Idioms 1: Papers from the First Nordic Conference on Syntactic Freezes (pp. 86–99). Joensuu: Joensuu University Press.
Evert, S. (2008). Corpora and collocations. In A. Lüdeling & M. Kytö (Eds.), Corpus Linguistics. An International Handbook (pp. 1212–1248). Berlin: Mouton de Gruyter.
Granger, S. (1996). From CA to CIA and back: An integrated approach to computerized bilingual and learner corpora. In K. Aijmer, B. Altenberg & M. Johansson (Eds.), Languages in Contrast: Paper from a Symposium on Text-based Cross-linguistic Studies (pp. 37–51). Lund: Lund University Press.
(2003). The International Corpus of Learner English: A new resource for foreign language learning and teaching and second language acquisition research. TESOL Quarterly, 37(3), 538–546.
Granger, S., & Meunier, F. (2008). Phraseology: An Interdisciplinary Perspective. Amsterdam/Philadelphia: John Benjamins.
Gries, S., & Wulff, S. (2005). Do foreign language learners also have constructions? Evidence from priming, sorting, and corpora. Annual Review of Cognitive Linguistics, 31, 182–200.
Herbst, T. (2011). Choosing sandy beaches – collocations, probabemes and the idiom principle. In T. Herbst, S. Faulhaber & P. Uhrig (Eds.), The Phraseological View of Language (pp. 27–57). Berlin: Walter de Gruyter.
Hyland, K. (1998). Hedging in Scientific Research Articles. Amsterdam/Philadelphia: John Benjamins.
Laufer, B., & Nation, P. (1995). Vocabulary size and use: Lexical richness in L2 written production. Applied Linguistics, 16(3), 307–322.
Lieven, E., & Tomasello, M. (2008). Children’s first language acquisition from a usage-based perspective. In P. Robinson & N. C. Ellis (Eds.), Handbook on Cognitive Linguistics and Second Language Acquisition (pp. 168–196). New York, NY: Routledge.
McEnery, T., & Hardy, A. (2014). Corpus Linguistics: Method, Theory and Practice. Cambridge: Cambridge University Press.
Manning, C., & Schütze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.
McEnery, T., & Wilson, A. (2004). Corpus Linguistics: An Introduction. Edinburgh: Edinburgh University Press.
Mel’čuk, I. (1998). Collocations and lexical functions. In A. P. Cowie (Ed.), Phraseology: Theory, Analysis, and Applications (pp. 23–53). Oxford: Clarendon Press.
Nesselhauf, N. (2005). Collocations in a Learner Corpus. Amsterdam/Philadelphia: John Benjamins.
O’Donnell, M., Römer, U., & Ellis, N. C. (2013). The development of formulaic sequences in first and second language writing: Investigating effects of frequency, association, and native norm. International Journal of Corpus Linguistics, 18(1), 83–108.
Paquot, M. B., & Granger, S. (2012). Formulaic language in learner corpora. Annual review of Applied Linguistics, 321, 130–149.
Pawley, A., & Syder, F. H. (1983). Two puzzles for linguistic theory: Nativelike selection and nativelike fluency. In J. Richards & R. Schmidt (Eds.), Language and Communication (pp. 191–225). New York/London: Longman.
Pivovarova, L., Kormacheva, D., & Kopotev, M. (2017). Evaluation of collocation extraction methods for the Russian language. In M. Kopotev, O. Lyashevskaja & A. Mustajoki (Eds.), Quantitative Approaches to the Russian Language. New York, NY: Routledge.
Römer, U. (2010). Establishing the phraseological profile of a text type: The construction of meaning in academic book reviews. English Text Construction, 3(1), 95–119.
Römer, U., & O’Donnell, M. B. (2011). From student hard drive to web corpus (part 1): The design, compilation and genre classification of the Michigan Corpus of Upper-level Student Papers (MICUSP). Corpora, 6(2), 159–177.
Schmitt, N., & Carter, R. (2004). Formulaic sequences in action. An introduction. In N. Schmitt (Ed.), Formulaic Sequences (pp. 2–22). Amsterdam/Philadelphia: John Benjamins.
Simpson-Vlach, R., & Ellis, N. C. (2010). An academic formulas list: New methods in phraseology research. Applied Linguistics, 31(4), 487–512.
Tomasello, M. (2003). Constructing a Language: A Usage-Based Theory of Language Acquisition. Cambridge, MA: Harvard University Press.
Cited by (6)
Cited by six other publications
Liu, Yi, Han Xu & Dechao Li
Lang, Juanjuan
Samraj, Betty
Lu, Xiaofei & Renfen Hu
Pan, Fan, Randi Reppen & Douglas Biber
2020. Methodological issues in contrastive lexical bundle research. International Journal of Corpus Linguistics 25:2 ► pp. 216 ff.
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
