Lectal constraining of lexical collocations: How a word’s company is influenced by the usage settings

Tummers, Jose; Speelman, Dirk; Heylen, Kris; Geeraerts, Dirk

doi:10.1075/cf.7.1.01tum

Article published In: Constructions and Frames
Vol. 7:1 (2015) ► pp.1–46

Get fulltext from our e-platform

Download PDF

Lectal constraining of lexical collocations

How a word’s company is influenced by the usage settings

Jose Tummers | University College Leuven

Dirk Speelman | KU Leuven, Quantitative Lexicology and Variational Linguistics

Kris Heylen

Dirk Geeraerts

Published online: 28 January 2016

https://doi.org/10.1075/cf.7.1.01tum

Adopting a corpus-based approach, lexical collocations are reconsidered from a lectal perspective. Analyzing adjective-noun collocations, it will be shown that lexical collocations are conditioned by the language settings in which they are used. These lectal constraints do not only apply to lexical collocations as a measure of lexical association but also to their potential function as a determinant of other constructions. These results argue for the inclusion of the heterogeneity of the corpus settings in empirical linguistic models and for the integration of a full-fledged lectal dimension in theoretical frameworks advocating a usage-based methodology, such as construction grammar.

Keywords: lexical collocations, phrasal names, lectal constraining, usage-based linguistics, mixed-effects modeling

References (98)

Archer, D. (Ed.). (2009). What’s in a word-list? Investigating word frequency and keyword extraction. Farnham: Ashgate.

Baayen, R.H. (2001). Word frequency distributions. Dordrecht: Kluwer.

. (2008). Analyzing linguistic data. A practical introduction to statistics using R. Cambridge: Cambridge University Press.

Baayen, R.H., Piepenbrock, R., & Rijn, H. van (1993). The CELEX lexical database (CD-ROM). Philadelphia: Linguistic Data Consortium, University of Pennsylvania.

Barlow, M., & Kemmer, S. (Eds.). (2000). Usage based models of language. Stanford: CSLI Publications.

Bates. D. (2005). Fitting linear mixed models in R. R News, 51, 27–30.

Bates, D., Mächler, M., Bolker, B., & Walker, S. (submitted). Fitting linear mixed-effects models using lme4. Submitted to Journal of Statistical Software. Consulted at: [URL].

Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge University Press.

Biber, D., & Conrad, S. (2009). Register, genre, and style. Cambridge: Cambridge University Press.

Bieman, C., Bildhauer, F., Evert, S., Goldhahn, D., Quasthoff, U., Schäfer, R., Simon, J., Swiezinsky, L., & Zesch, T. (2013). Scalable construction of high-quality web corpora. Journal for Language Technology and Computational Linguistics, 28(2), 23–60.

Blom, A. (1994). Het ondoorgrondelijk bijvoeglijk naamwoord [The inscrutable adjective]. Forum der Letteren, 35(2), 81–94.

Booij, G. (2002a). Constructional idioms, morphology, and the Dutch lexicon. Journal of Germanic Linguistics, 14(4), 301–329.

. (2002b). The morphology of Dutch. Oxford: Oxford University Press.

. (2009). Phrasal names: A constructionist analysis. Word Structure, 2(2), 219–240.

. (2010). Construction morphology. Language and Linguistics Compass, 3(1), 1–13.

. (2013). Construction morphology, a brief introduction. Morphology, 22(3), 343–346.

Broekhuis, H. (1999). Adjectives and adjective phrases (Modern Grammar of Dutch Working Papers 2). Tilburg: Universiteit Tilburg.

Brysbaert, M., & New, B. (2009). Moving beyond Kucera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41(4), 977–990.

Bybee, J., & Hopper, P. (Eds.). (2001). Frequency and the emergence of linguistic structure. Amsterdam: John Benjamins.

Church, K., & Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1), 22–29.

Croft, W., & Cruse, D.A. (2004). Cognitive linguistics. Cambridge: Cambridge University Press.

De Schutter, G. (1997). The noun phrase in dutch. Leuvense Bijdragen, 86(3), 309–356.

Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61–74.

Evert, S. (2004). The Statistics of Word Cooccurrences: Word Pairs and Collocations. Doctoral dissertation, Institut für maschinelle Sprachverarbeitung, University of Stuttgart, Germany.

. (2008). Corpora and collocations. In A. Lüdeling & M. Kytö (Eds.), Corpus linguistics. An international handbook (pp. 1212–1248). Berlin: Mouton de Gruyter.

. (2015). A NLP Approach to the Evaluation of Web Corpora. Guest lecture, Leuven, 17 February 2015.

Ferraresi, A., Zanchetta, E., Baroni, M., & Bernardini, S. (2008). Introducing and evaluating ukWaC, a very large web-derived corpus of English. In S. Evert, A. Kilgarriff, & S. Sharoff (Eds.), Proceedings of the 4th Web as Corpus Workshop (WAC−4) – Can we beat Google? Marrakech, 1 June 2008 (pp. 47–54).

Fillmore, C., Kay, P., & O’Connor, M. (1988). Regularity and idiomaticity in grammatical constructions: The case of let alone. Language, 64(3), 501–538.

Firth, J. (1957). A synopsis of linguistic theory 1930−1955. In J. Firth (Ed.), Studies in linguistic analysis (pp. 1–32). Oxford: Blackwell.

Fox, J. (2003). Effect displays in R for generalized linear models. Journal of Statistical Software, 8(15), 1–27.

Fox, J., & Weisberg, S. (2011). An R companion to applied regression. Thousand Oakes, CA: Sage.

Geeraerts, D. (2005). Lectal variation and empirical data in cognitive linguistics. In F. Ruiz de Mendoza (Ed.), Cognitive linguistics, functionalism, discourse studies: Common ground and new directions (pp. 163–190). Berlin: Mouton de Gruyter.

Geeraerts, D., & Kristiansen, G. (2014). Cognitive linguistics and language variation. In J. Littlemore & J. Taylor (Eds.), The bloomsburry companion to cognitive linguistics (pp. 202–217). London: Bloomsbury Academic.

Geeraerts, D., Kristiansen, G., & Peirsman, Y. (Eds.). (2010). Advances in cognitive sociolinguistics. Berlin: Walter de Gruyter.

Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge: Cambridge University Press.

Gledhill, C. (2000). Collocations in science writing. Tübingen: Narr.

Goldberg, A. (2006). Constructions at work. The nature of generalization in language. Oxford: Oxford University Press.

Granger, S. (1998). Prefabricated patterns in advanced EFL writing: Collocations and formulae. In A. Cowie (Ed.), Phraseology: Theory, analysis and applications (pp. 145–160). Oxford: Oxford University Press.

Gries, S. (2008). Phraseology and linguistic theory: A brief survey. In S. Granger & F. Meunier (Eds.), Phraseology: An interdisciplinary perspective (pp. 3–25). Amsterdam: John Benjamins.

. (2013a). 50-something years of work on collocations. What is or should be next. International Journal of Corpus Linguistics, 18(1), 137–165.

. (2013b). Sources of variability relevant to the cognitive sociolinguist, and corpus- as well as psycholinguistic methods and notions to handle them. Journal of Pragmatics, 521, 5–16.

Gries, S., & Stefanowitsch, A. (2004). Extending collostructional analysis. A corpus-based perspective on ‘alternations’. International Journal of Corpus Linguistics, 9(1), 97–129.

Grieve, J. (2014). A comparison of statistical methods for the aggregation of regional linguistic variation. In B. Szmrecsanyi & B. Wälchli (Eds.), Aggregating dialectology, typology, and register analysis: Linguistic variation in text and speech (pp. 53–88). Berlin/New York: Walter de Gruyter.

Grondelaers, S., & Speelman, D. (2007). A variationist account of constituent ordering in presentative sentences in Belgian Dutch. Corpus Linguistics and Linguistic Theory, 3(2), 161–193.

Grondelaers, S., Speelman, D., & Geeraerts, D. (2008). National variation in the use of er ‘there’: regional and diachronic constraints on cognitive explanations. In R. Dirven & G. Kristiansen (Eds.), Cognitive sociolinguistics: Language variation, cultural models, social systems (pp. 153–204). Berlin / New York: De Gruyter.

Haas, W. de & Trommelen, M. (1993). Morfologisch handboek van het Nederlands: een overzicht van de woordvorming [Morphological textbook of Dutch: an overview of word formation]. Den Haag: Sdu Uitgevers.

Haeseryn, W., Romijn, K., Geerts, G., Rooij, J. de, & Toorn, M.C. van den. (1997). Algemene Nederlandse Spraakkunst [General Dutch Grammar]. Groningen: Martinus Nijhoff – Deurne: Wolters Plantyn.

Harrell, F. (2001). Regression modeling strategies, with applications to linear models, survival analysis and logistic regression. New York: Springer.

Heylen, K., Tummers, J., & Geeraerts, D. (2008). Methodological issues in corpus-based cognitive linguistics. In G. Kristiansen & R. Dirven (Eds.), Cognitive sociolinguistics. Language variation, cultural models, social systems (pp. 91–128). Berlin: Mouton de Gruyter.

Honselaar, W. (1980). On the semantics of adjective-noun combinations. In A. Barentsen, B. Groen, & R. Sprenger (Eds.), Studies in slavic and general linguistics (pp. 187–206). Amsterdam: Rodopi.

Horst, J. van der (1995). Analytische taalkunde [Analytical linguistics]. Groningen: Martinus Nijhoff.

Hüning, M. (2004). Over woorden en woordgroepen: A+N-verbindingen in het Nederlands en het Duits [About words and phrases: A+N sequences in Dutch and German]. In S. Kiedron & A. Kowalska-Szubert (Eds.), Thesaurus polyglottus et flores quadrilingues. Festschrift für Stanislaw Predota zum 60. Geburtstag (pp. 159–171). Wroclaw: ATUT.

. (2010). Adjective + Noun constructions between syntax and word formation in Dutch and German. In S. Michel & M. Onysko (Eds.), Cognitive approaches to word formation (pp. 195–218). Berlin: Mouton de Gruyter.

Janda, L., Nesset, T., & Baayen, R.H. (2010). Capturing correlational structure in Russian paradigms: A case study in logistic mixed-effects modeling. Corpus Linguistics and Linguistic Theory, 6(1), 29–48.

Kehoe, A., & Gee, M. (2009). Weaving web data into a diachronic corpus patchwork. In A. Renouf & A. Kehoe (Eds.), Corpus linguistics: Refinements & reassessments (pp. 255–279). Amsterdam: Rodopi.

Keller, F., & Lapata, M. (2003). Using the web to obtain frequencies for unseen bigrams. Computational Linguistics, 29(3), 459–484.

Kilgarriff, A. (2005). Language is never, ever, ever random. Corpus Linguistics and Linguistic Theory, 1(2), 263–275.

Kilgarriff, A., Reddy, S., Pomikálek, J., & Avinesh, P. (2010). A corpus factory for many languages. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner, & D. Tapias (Eds.), Proceedings of LREC 2010, Seventh International Conference on Language Resources and Evaluation (pp. 904–910).

Kristianen, G. (2008). Style-shifting and shifting styles: A socio-cognitive approach to lectal variation. In R. Dirven & G. Kristiansen (Eds.), Cognitive sociolinguistics: Language variation, cultural models, social systems (pp. 45–88). Berlin / New York: De Gruyter.

Kristiansen, G., & Geeraerts, D. (2013). Introduction. Contexts and usage in cognitive sociolinguistics. Journal of Pragmatics, 521, 1–4.

Labov, W. (1972). Some principles of linguistic methodology. Language in Society, 11, 97–120.

. (1994). Principles of linguistic change. Oxford: Blackwell.

Langacker, R. (1990). Concept, image, and symbol: The cognitive basis of grammar. (Cognitive Linguistics Research 1). Berlin: Mouton de Gruyter.

Lebrun, Y., & Schurmans-Swillen, G. (1966). Verbogen tegenover onverbogen adjectieven in de taal van de Zuidnederlandse dagbladpers [Inflected vs. uninflected adjectives in the language of the Belgian Dutch daily press]. Taal en Tongval, 18(1), 175–187.

Levshina, N., Geeraerts, D., & Speelman, D. (2013). Towards a 3D-grammar: Interaction of linguistic and extralinguistic factors in the use of Dutch causative constructions. Journal of Pragmatics, 521, 34–48.

Mair, C. (2007). Varieties of English around the world: Collocational and cultural profiles. In P. Skandera (Ed.), Phraseology and culture in English (pp. 437–468). Berlin: Mouton de Gruyter.

Manning, C., & Schütze, H. (2002). Foundations of statistical natural language processing. Cambridge: MIT Press.

Nunberg, G., Sag, I., & Wasow, T. (1994). Idioms. Language, 70(3), 491–538.

Odijk, J. (1992). Uninflected adjectives in Dutch. In R. Bok-Bennema & R. van Hout (Eds.), Linguistics in the Netherlands 9 (pp. 197–208). Amsterdam: John Benjamins.

Oostdijk, N. (2004). The design of the spoken Dutch corpus. In P. Peters, P. Collins, & A. Smith (Eds.), New frontiers of corpus research (pp. 105–112). Amsterdam: Rodopi.

Pedersen, T. (1996). Fishing for Exactness. Proceedings of the South-Central SAS Users Group Conference (SCSUG−96) , Austin, TX, October 27−29 (pp. 188–200).

Rietveld, T., Hout, R. van, & Ernestus, M. (2004). Pitfalls in corpus research. Computers and the Humanities, 38(4), 343–362

Rooij, J. de. (1980a). Ons bruin(e) paard I [Our brown(-infl) horse I]. Taal en Tongval, 32(1), 3–25.

. (1980b). Ons bruin(e) paard II [Our brown(-infl) horse II]. Taal en Tongval, 32(2), 109–129.

Ruette, T., Geeraerts, D., Peirsman, Y., & Speelman, D. (2014). Semantic weighting mechanisms in scalable lexical sociolectometry. In B. Szmrecsanyi & B. Wälchli (Eds.), Aggregating dialectology, typology, and register analysis: Linguistic variation in text and speech (pp. 178–198). Berlin: Walter de Gruyter.

Schäfer, R., & Bildhauer, F. (2012). Building Large corpora from the web using a new efficient tool chain. In N. Calzolari, K. Choukri, T. Declerck, M. Doğan, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12) (pp. 486–493).

Schlücker, B., & Plag, I. (2011). Compound or phrase? Analogy in naming. Lingua, 121(9), 1539–1551.

Schönefeld, D. (2013). It is … quite common for theoretical predictions to go untested (BNC_CMH). A register-specific analysis of the English go un-V-en construction. Journal of Pragmatics, 521, 17–33.

Shannon, C. (1948). A Mathematical Theory of Communication. Bell System Technical Journal, 27(3), 379–423.

Sinclair, J. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press.

Speelman, D., Grondelaers, S., & Geeraerts, D. (2003). Profile-based linguistic uniformity as a generic method for comparing language varieties. Computers and the Humanities, 371, 317–337.

Speelman, D., Tummers, J., & Geeraerts, D. (2009). Lexical patterning in a construction grammar. The effect of lexical co-occurrence patterns on the inflectional variation in Dutch attributive adjectives. Constructions and Frames, 1(1), 87–118.

Stefanowitsch, A., & Gries, S. (2003). Collostructions: Investigating the interaction of words and constructions. International Journal of Corpus Linguistics, 8(2), 209–243.

. (2008). Channel and constructional meaning: A collostructional case study. In G. Kristiansen & R. Dirven (Eds.), Cognitive sociolinguistics. Language variation, cultural models, social Systems (pp. 129–152). Berlin: Mouton de Gruyter.

Sterkenburg, P. Van. (1993). Gelexicaliseerde woordgroepen van het type A+N [Lexicalized A+N phrases]. Tabu, 23(1−2), 131–142.

Stubbs, M. (1995). Collocations and semantic profiles: On the cause of the trouble with quantitative studies. Functions of Language, 2(1), 23–56.

Szmrecsanyi, B. (2010). The English genitive alternation in a cognitive sociolinguistics perspective. In D. Geeraerts, G. Kristiansen, & Y. Peirsman (Eds.), Advances in Cognitive Sociolinguistics (pp. 141–166). Berlin/New York: De Gruyter.

. (2013). The great regression: Genitive variability in Late Modern English news texts. In K. Börjars, D. Denison, & A. Scott (Eds.), Morphosyntactic categories and the expression of possession (pp. 89–112). Amsterdam: John Benjamins.

. (2014). Forests, trees, corpora, and dialect grammars. In B. Szmrecsanyi & B. Wälchli (Eds.), Aggregating dialectology, typology, and register analysis: Linguistic variation in text and speech (pp. 89–112). Berlin: Walter de Gruyter.

Tagliamonte, S., & Baayen, R.H. (2012). Models, forests and trees of York English: Was/were variation as a case study for statistical practice. Language Variation and Change, 24(2), 135–178.

Tomasello, M. (2003). Constructing a language: A usage-based theory of language acquisition. Harvard: Harvard University Press.

Tummers, J. (2005). Het naakt(e) adjectief. Kwantitatief-empirisch onderzoek naar de adjectivische buigingsalternantie bij neutra [The naked(-inflected) adjective. Quantitative and empirical study of the inflectional alternation of adjectives with neuter head nouns]. Unpublished doctoral dissertation, KU Leuven, Belgium.

Tummers, J., Heylen, K., & Geeraerts, D. (2005). Usage-based approaches in cognitive linguistics: A technical state of the art. Corpus Linguistics and Linguistic Theory, 1(2), 225–261.

Tummers, J., Speelman, D., & Geeraerts, D. (2014). Spurious effects in variational corpus linguistics: Identification and implications of confounding. International Journal of Corpus Linguistics, 19(4), 478–504.

Van Eynde, F. (2003). Morpho-syntactic agreement and index agreement in Dutch NPs. In T. Gaustad (Ed.), Computational linguistics in the Netherlands 2002 (pp. 111–127). Amsterdam: Rodopi.

Van Gijsel, S. (2007). A corpus linguistic, sociovariational analysis of lexical richness. Unpublished doctoral dissertation, KU Leuven, Belgium.

Wulff, S. (2008). Rethinking idiomaticity: A usage-based approach. London: Continuum Press.

. (2013). Words and idioms. In T. Hoffmann & G. Trousdale (Eds.), The Oxford handbook of construction grammar (pp. 274–289). Oxford: Oxford University Press.

Cited by (1)

Cited by one other publication

Dekalo, Volodymyr

2021. Exploring relative degrees of auxiliarization empirically in German modal constructions with wissen and verstehen . In Modality and Diachronic Construction Grammar [Constructional Approaches to Language, 32], ► pp. 53 ff.

This list is based on CrossRef data as of 5 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.