Article published In: Cumulative Knowledge Building in Learner Corpus Research
Edited by Tove Larsson and Douglas Biber
[International Journal of Learner Corpus Research 11:1] 2025
► pp. 217–244
Phraseological sophistication as a multidimensional construct
Exploring the relationship between association, register specificity and frequency of word combinations
Published online: 29 October 2024
https://doi.org/10.1075/ijlcr.23033.paq
https://doi.org/10.1075/ijlcr.23033.paq
Abstract
Since (2019). The
phraseological dimension in interlanguage complexity research. Second Language
Research, 35(1), 121–145. , several unresolved issues have persisted regarding
the operationalization of phraseological sophistication in L2 complexity research. One of the most crucial concerns relates to the
extent to which the commonly used measures of phraseological sophistication (MI scores) fully represent the intended construct. In
this study, we draw upon insights from L2 phraseological research to reexamine the conceptualization and operationalization of
phraseological sophistication. We conduct new analyses on the learner corpus used in (2019). The
phraseological dimension in interlanguage complexity research. Second Language
Research, 35(1), 121–145. , using alternative operationalizations of phraseological sophistication that represent different dimensions of
sophistication (based on the register specificity of word combinations and their frequency). Results show that measures
representing the dimensions of association (MI scores) and register specificity (ratios of academic collocations) correlate with
each other. Frequency-based measures, however, pattern very differently, which we attribute to some issues in the way we
operationalized frequency of co-occurrence.
Article outline
- 1.Introduction
- 2.Statistical collocations, lexical bundles, n-grams and dimensions of phraseological L2 phraseological
studies
- 2.1The more advanced the learners, the more strongly associated their collocations
- 2.2The more advanced the learners, the more register-specific their recurrent sequences of words
- 2.3Frequency matters too
- 2.4Cumulative research point to the need for a multidimensional approach to phraseological knowledge
- 3.Data and method
- 3.1Learner corpus
- 3.2Operationalization of phraseological sophistication
- 3.2.1Phraseological sophistication by register specificity
- 3.2.2Phraseological sophistication by frequency
- 3.3Statistics
- 4.Results
- 4.1Phraseological sophistication by register specificity
- 4.2Phraseological sophistication by frequency
- 4.3Shared patterns of variation
- 5.Discussion
- 6.Conclusion
- Acknowledgements
- Open Data badge
- Notes
References
References (68)
Ackermann, K., & Chen, Y.-H. (2013). Developing
the Academic Collocation List (ACL) — A corpus-driven and expert-judged approach. Journal of
English for Academic
Purposes, 12(4), 235–247.
Ädel, A., & Erman, B. (2012). Recurrent
word combinations in academic writing by native and non-native speakers of English: A lexical bundles
approach. English for Specific
Purposes, 31(2), 81–92.
Bartsch, S. (2004). Structural
and functional properties of collocations in English. Gunter Narr Verlag.
Bestgen, Y., & Granger, S. (2014). Quantifying
the development of phraseological competence in L2 English writing: An automated
approach. Journal of Second Language
Writing, 261, 28–41.
(2017). Using
collgrams to assess L2 phraseological development: A replication
study. In P. de Haan, R. de Vries, & S. van Vuuren (Eds.), Language,
learners and levels: Progression and
variation (pp. 385–408). Presses Universitaires de Louvain.
(2018). Tracking
L2 writers’ phraseological development using collgrams: Evidence from a longitudinal EFL
corpus. In S. Hoffman & A. Sand (Eds.), Corpora
and
lexis (pp. 277–301). Brill.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). The
Longman grammar of spoken and written English. Pearson Education Limited.
Biber, D., Conrad, S., & Cortes, V. (2004). If
you look at …: Lexical bundles in university teaching and textbooks. Applied
Linguistics, 25(3), 371–405.
Bybee, J., & Beckner, C. (2012). Usage-based
theory. In H. Narrog & B. Heine (Eds.), The
Oxford handbook of linguistic
analysis (pp. 827–855). Oxford University Press.
Chen, Y.-H., & Baker, P. (2016). Investigating
criterial discourse features across second language development: Lexical bundles in rated learner essays, CEFR B1, B2 and
C1. Applied
Linguistics, 37(6), 849–880.
Council of Europe. (2001). Common European
framework of reference for languages: Learning, teaching, assessment. Cambridge University Press.
Crossley, S. A., Cai, Z., & McNamara, D. S. (2012). Syntagmatic,
paradigmatic, and automatic n-gram approaches to assessing essay
quality. In P. M. McCarthy & G. M. Youngblood (Eds.), Proceedings
of the 25th International Florida Artificial Intelligence Research Society (FLAIRS)
Conference (pp. 214–219). AAAI Press.
Crossley, S. A., Salsbury, T., & McNamara, D. S. (2013). Validating
lexical measures using human scores of lexical proficiency. In S. Jarvis & M. Daller (Eds.), Vocabulary
knowledge: Human ratings and automated
measures (pp. 105–134). John Benjamins.
De Marneffe, M.-C., & Manning, C. (2016). Stanford
typed dependencies manual. [URL]
Durrant, P. (2009). Investigating
the viability of a collocation list for students of English for academic purposes. English for
Specific
Purposes, 28(3), 157–169.
Durrant, P., & Schmitt, N. (2009). To
what extent do native and non-native writers make use of collocations? IRAL — International
Review of Applied Linguistics in Language
Teaching, 47(2), 157–177.
Ellis, N. C. (1996). Sequencing
in SLA: Phonological memory, chunking and points of order. Studies in Second Language
Acquisition, 18(1), 91–126.
Evert, S. (2004). The
statistics of word cooccurrences: Word pairs and collocations. Unpublished PhD
thesis. Universität Stuttgart. [URL]
Gablasova, D., Brezina, V., & McEnery, T. (2017). Collocations
in corpus-based language learning research: Identifying, comparing, and interpreting the
evidence. Language
Learning, 67(S1), 155–179.
Garner, J., Crossley, S., & Kyle, K. (2019). N-gram
measures and L2 writing
proficiency. System, 801, 176–187.
Goldberg, A. (2006). Constructions
at work: The nature of generalization in language. Oxford University Press.
Granger, S., & Paquot, M. (2008). Disentangling
the phraseological web. In S. Granger & F. Meunier (Eds), Phraseology:
An interdisciplinary
perspective (pp. 27–49). John Benjamins.
Granger, S., & Bestgen, Y. (2014). The
use of collocations by intermediate vs. advanced non-native writers : A bigram-based
study. International Review of Applied Linguistics in Language
Teaching, 52(3), 229–252.
Gries, S. Th. (2013). Statistics for linguistics with R. A
practical introduction (2nd ed.). De Gruyter Mouton.
(2022). What do (some of) our
association measures measure (most)? Association? Journal of Second Language
Studies, 5(1), 1–33.
Honnibal, M., & Montani, I. (2017). spaCy
2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental
parsing. [URL]
Hu, R., Wu, J., & Lu, X. (2022). Word-combination-based
measures of phraseological diversity, sophistication, and complexity and their relationship to second language Chinese
proficiency and writing quality. Language
Learning, 72(4), 1128–1169.
Jiang, J., Bi, P., Xie, N., & Liu, H. (2023). Phraseological
complexity and low — and intermediate-level L2 learners’ writing quality. International Review
of Applied Linguistics in Language
Teaching, 61(3), 765–790.
Kim, M., & Crossley, S. A. (2023). Lexical
and phraseological differences between second language written and spoken opinion
responses. Frontiers in
Psychology, 141, 1068685.
Kim, M., Crossley, S. A., & Kyle, K. (2018). Lexical
sophistication as a multidimensional phenomenon : Relations to second language lexical proficiency, development, and writing
quality. The Modern Language
Journal, 102(1), 120–141.
Kim, S., & Kessler, M. (2022). Examining
L2 English university students’ uses of lexical bundles and their relationship to writing
quality. Assessing
Writing, 511, 100589.
Kyle, K., & Crossley, S. A. (2015). Automatically
assessing lexical sophistication: Indices, tools, findings, and application. TESOL
Quarterly, 49(4), 757–786.
Kyle, K., Crossley, S. A., & Berger, C. (2018). The
tool for the analysis of lexical sophistication (TAALES): Version 2.0. Behavior Research
Methods, 50(3), 1030–1046.
Kyle, K., & Eguchi, M. (2021). Automatically
assessing lexical sophistication using word, bigram, and dependency
indices. In S. Granger (Ed.), Perspectives
on the L2 phrasicon: The view from learner
corpora (pp. 126–151). Multilingual Matters.
Laufer, B., & Waldman, T. (2011). Verb-noun
collocations in second language writing: A corpus analysis of learners’ English. Language
Learning, 61(2), 647–672.
Liu, D. (2012). The
most frequently-used multi-word constructions in academic written English: A multi-corpus
study. English for Specific
Purposes, 31(1), 25–35.
McCallum, L. (2020). Relationships
between measures of phraseological complexity and writing quality in a CEFR assessment
context. Arab Journal of Applied
Linguistics, 5(1), 63–99.
Naets, H., Shen, J., & Paquot, M. (2024). A
database of English dependencies with measures of frequency, association, range and keyness. [Data
set]. . Open Data @ UCLouvain,
V1.
Norris, J., & Ortega, L. (2003). Defining
and measuring SLA. In C. J. Doughty & M. H. Long (Eds.), The
handbook of second language
acquisition (pp. 717–761). Wiley.
(2018). Phraseological
competence: A missing component in university entrance language tests? Insights from a study of EFL learners’ use of
statistical collocations. Language Assessment
Quarterly, 15(1), 29–43.
(2019). The
phraseological dimension in interlanguage complexity research. Second Language
Research, 35(1), 121–145.
(2021, August 15–21). Measures
of phraseological complexity: reliability and validity [paper
presentation]. Investigating complexity in L2 phraseology: methods and applications symposium,
AILA World Congress 2021, University of Groningen, The
Netherlands.
Paquot, M., & Granger, S. (2012). Formulaic
language in learner corpora. Annual Review of Applied
Linguistics, 321, 130–149.
Paquot, M., Larsson, T., Hasselgård, H., Ebeling, S. O., De Meyere, D., Valentin, L., Laso, N. J., Verdaguer, I., & van Vuuren, S. (2022b). The
Varieties of English for Specific Purposes dAtabase (VESPA): Towards a multi-L1 and multi-register learner corpus of
disciplinary writing. Research in Corpus
Linguistics, 10(2), 1–15.
Paquot, M., Naets, H., & Gries, S. Th. (2021). Using syntactic
co-occurrences to trace phraseological complexity development in learner writing: verb + object structures in
LONGDALE. In B. Le Bruyn & M. Paquot (Eds.), Learner
corpus research meets second language
acquisition (pp. 122–147). Cambridge University Press.
R Core Team (2022). R: A language and
environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. [URL]
Rogers, J., Müller, A., Daulton, F. E., Dickinson, P., Florescu, C., Reid, G., & Stoeckel, T. (2021). The
creation and application of a large-scale corpus-based academic multi-word unit
list. English for Specific
Purposes, 621, 142–157.
Römer, U. (2009). The
inseparability of lexis and grammar. Corpus linguistic perspectives. Annual Review of Cognitive
Linguistics, 71, 141–163.
Rubin, R., Housen, A., & Paquot, M. (2021). Phraseological
complexity as an index of L2 Dutch writing proficiency: A partial replication
study. In S. Granger (Ed.), Perspectives
on the L2
phrasicon (pp. 101–125). Multilingual Matters.
Schäfer, R. (2015). Processing
and querying large Web corpora with the COW14 architecture. In P. Bański, H. Biber, E. Breiteneder, M. Kupietz, H. Lüngen, & W. Andreas (Eds.), Proceedings
of the 3rd workshop on challenges in the management of large
corpora (CMLC-3) (pp. 28–34). Leibnitz-Institut für Deutsche Sprache.
Schmitt, N. (2004). Formulaic
sequences: Acquisition, processing and use. John Benjamins.
Staples, S., Egbert, J., Biber, D., & McClair, A. (2013). Formulaic
sequences and EAP writing development: Lexical bundles in the TOEFL iBT writing
section. Journal of English for Academic
Purposes, 12(3), 214–225.
Vandeweerd, N., Housen, A., & Paquot, M. (2021). Applying
phraseological complexity measures to L2 French: A partial replication study. International
Journal of Learner Corpus
Research, 7(2), 197–229.
(2022). Comparing
the longitudinal development of phraseological complexity across oral and written
tasks. Studies in Second Language
Acquisition, 45(4), 1–25.
(2023). Proficiency
at the lexis–grammar interface: Comparing oral versus written French exam tasks. Language
Testing, 40(3), 658–683.
Wolfe-Quintero, K., Inagaki, S., & Kim, H.-Y. (1998). Second
language development in writing: Measures of fluency, accuracy, and complexity. University of Hawaii Press.
Xia, D., Chen, Y., & Pae, H. K. (2022a). Lexical
and grammatical collocations in beginning and intermediate L2 argumentative essays: A bigram
study. IRAL — International Review of Applied Linguistics in Language
Teaching, 61(4), 1421–1453.
Xia, D., Ai, H., & Pae, H. K. (2022b). Please
let me know: Lexical bundles in business emails by business English learners and working
professionals. International Journal of Learner Corpus
Research, 8(1), 1–30.
