Uncovering the extent of word associations and how they are manifested has been an important area of study in corpus linguistics since the 1960s (Sinclair et al. 1970). This paper defines and describes a new way of categorising word association, the concgram, which constitutes all of the permutations of constituency and positional variation generated by the association of two or more words. Concgrams are identified without prior input from the user (other than to set the size of the span) employing a fully automated search that reveals all of the word association patterns that exist in a corpus. This study argues that concgrams represent more fully word associations in a corpus. Most concgrams seem to be non-contiguous, and show both constituency (AB, ACB) and positional (AB, BA) variations. Further studies of concgrams will help in the task of uncovering the full extent of the idiom principle (Sinclair 1987).
2025. The Synergy of Seeing and Saying: Revolutionary Advances in Multi-modality Medical Vision-Language Large Models. Artificial Intelligence Science and Engineering 1:2 ► pp. 79 ff.
2024. 2024 International Conference on Intelligent Computing, Communication, Networking and Services (ICCNS), ► pp. 1 ff.
Bai, Shengxing, Chenyang Bu & Xindong Wu
2024. High‐degree penalty based global statistical network embedding for name disambiguation in anonymized graph. Concurrency and Computation: Practice and Experience 36:20
Bender, Michael, Noah Bubenhofer & Nina Janich
2024. Die öffentliche Aushandlung von Expertise: Wissenschaftsblogs als Ort eristischer Verständigung? Exploratorischer Einstieg in ein Forschungsprojekt. Zeitschrift für germanistische Linguistik 52:1 ► pp. 183 ff.
Chen, Zhe & Aixin Sun
2024. DP-GCN: Node Classification by Connectivity and Local Topology Structure on Real-World Network. ACM Transactions on Knowledge Discovery from Data 18:6 ► pp. 1 ff.
2024. Exploring low-level statistical features of n-grams in phishing URLs: a comparative analysis with high-level features. Cluster Computing 27:10 ► pp. 13717 ff.
Hou, Zhide
2023. China’s Greater Bay Area Plan and Hong Kong: How Phraseologies Represent Different Voices in the Media. Sage Open 13:2
Hou, Zhide & Qianni Peng
2023. The national security law for Hong Kong: a corpus-driven comparative study of media representations between China’s and Anglo-American English-language press. Humanities and Social Sciences Communications 10:1
Morid, Mohammad Amin, Olivia R. Liu Sheng & Joseph Dunbar
2023. Time Series Prediction Using Deep Learning Methods in Healthcare. ACM Transactions on Management Information Systems 14:1 ► pp. 1 ff.
2023. Joint modeling method of question intent detection and slot filling for domain-oriented question answering system. Data Technologies and Applications 57:5 ► pp. 696 ff.
2023. Proceedings of the ACM Web Conference 2023, ► pp. 295 ff.
Bumblauskas, Dan, Amy Igou, Salil Kalghatgi & Cole Wetzel
2022. Public Policy and Broader Applications for the Use of Text Analytics During Pandemics. INFORMS Journal on Applied Analytics 52:6 ► pp. 568 ff.
Liu, Jundong, Elizabeth L. Chou, Kui Kai Lau, Peter Y.M. Woo, Jun Li & Kei Hang Katie Chan
2022. Machine learning algorithms identify demographics, dietary features, and blood biomarkers associated with stroke records. Journal of the Neurological Sciences 440 ► pp. 120335 ff.
2022. Fundamentals of the ETS. In Evaluation of Text Summaries Based on Linear Optimization of Content Metrics [Studies in Computational Intelligence, 1048], ► pp. 73 ff.
Wang, Zhao & Wenhui Yang
2022. 2022 7th International Conference on Multimedia Communication Technologies (ICMCT), ► pp. 60 ff.
Zeng, Fang, Niannian Chen, Dan Yang & Zhigang Meng
2022. Simplified-Boosting Ensemble Convolutional Network for Text Classification. Neural Processing Letters 54:6 ► pp. 4971 ff.
Ballance, Oliver James
2021. Narrow reading, vocabulary load and collocations in context: Exploring lexical repetition in concordances from a pedagogical perspective. ReCALL 33:1 ► pp. 4 ff.
Bouvier, Gwen & Zhonghua Wu
2021. A sociosemiotic interpretation of cultural heritage in UNESCO legal instruments: a corpus-based study. International Journal of Legal Discourse 6:2 ► pp. 229 ff.
Diaz, Brett A.
2021. Corpus Linguistic Methodology as an Advanced Conversion Design for Social Science Research. International Journal of Multiple Research Approaches 13:3 ► pp. 254 ff.
Ding, Yu, Hao Wei, Guyu Hu, Zhisong Pan & Shuaihui Wang
2021. Unifying community detection and network embedding in attributed networks. Knowledge and Information Systems 63:5 ► pp. 1221 ff.
2021. Using Graph Representation in Host-Based Intrusion Detection. Security and Communication Networks 2021 ► pp. 1 ff.
Li, Siyue & Chunyu Kit
2021. Legislative discourse of digital governance: a corpus-driven comparative study of laws in the European Union and China. International Journal of Legal Discourse 6:2 ► pp. 349 ff.
Liu, Eric (Yin) & Janny H.C. Leung
2021. Corpus insights into the harmonization of commercial media in China: News coverage of migrant worker issues as a case study. Discourse, Context & Media 41 ► pp. 100482 ff.
2021. A Hierarchical Structure-Aware Embedding Method for Predicting Phenotype-Gene Associations. In Advances in Knowledge Discovery and Data Mining [Lecture Notes in Computer Science, 12712], ► pp. 117 ff.
Wang, Zhibin, Xiaoliang Chen, Mingfeng Zhao, Xianyong Li & Yajun Du
2021. 2021 20th International Conference on Ubiquitous Computing and Communications (IUCC/CIT/DSCI/SmartCNS), ► pp. 309 ff.
2021. A Phrase-Level User Requests Mining Approach in Mobile Application Reviews: Concept, Framework, and Operation. Information 12:5 ► pp. 177 ff.
Zhou, Hui & Ming Chen
2021. What Still Needs to be Noted: Pseudo-Clefts in the Academic Discourse of Applied Linguistics. Frontiers in Psychology 12
BARABADİ, Elyas, Mohammad Ali ROBATJAZİ & Mokarrameh BAYAT
2020. A phraseological examination of research articles in the field of environment using key phrase frame. Eurasian Journal of Applied Linguistics 6:1 ► pp. 81 ff.
Buerki, Andreas
2020. Formulaic Language and Linguistic Change,
Cheng, Winnie & Phoenix Lam
2020. Ideology in Media Discourse. In The Cambridge Introduction to Applied Linguistics, ► pp. 339 ff.
Susan Conrad, Alissa Hartig & Lynn Santelmann
2020. The Cambridge Introduction to Applied Linguistics,
Conrad, Susan, Alissa J. Hartig & Lynn Santelmann
2020. The Path Forward. In The Cambridge Introduction to Applied Linguistics, ► pp. 388 ff.
Hossny, Ahmad Hany, Lewis Mitchell, Nick Lothian & Grant Osborne
2020. Feature selection methods for event detection in Twitter: a text mining approach. Social Network Analysis and Mining 10:1
Jones, Lucy & Luke Collins
2020. PrEP in the press. Journal of Language and Sexuality 9:2 ► pp. 202 ff.
Law, Locky
2020. Creativity and Television Drama: A t-Score and MI Value Cut-off Analysis of Pattern-forming Creativity in House M.D.. In Corpus-based Approaches to Grammar, Media and Health Discourses [The M.A.K. Halliday Library Functional Linguistics Series, ], ► pp. 347 ff.
Law, Locky
2021. Creativity and Register. Linguistics and the Human Sciences 15:1 ► pp. 97 ff.
Miller, Don
2020. Analysing Frequency Lists. In A Practical Handbook of Corpus Linguistics, ► pp. 77 ff.
Molaei, Soheila, Hadi Zare & Hadi Veisi
2020. Deep learning approach on information diffusion in heterogeneous networks. Knowledge-Based Systems 189 ► pp. 105153 ff.
Rossi, Ryan A., Nesreen K. Ahmed, Eunyee Koh, Sungchul Kim, Anup Rao & Yasin Abbasi-Yadkori
2020. Proceedings of the 13th International Conference on Web Search and Data Mining, ► pp. 483 ff.
Schmitt, Norbert & Diane Schmitt
2020. Vocabulary in Language Teaching,
Yao, Xinyue
2020. Idiomaticity in Intercultural Communication in English as Lingua Franca: A Corpus-based Study of Verb-Object Combinations. In Corpus-based Approaches to Grammar, Media and Health Discourses [The M.A.K. Halliday Library Functional Linguistics Series, ], ► pp. 73 ff.
Cheng, Le, Jiamin Pei & Marcel Danesi
2019. A sociosemiotic interpretation of cybersecurity in U.S. legislative discourse. Social Semiotics 29:3 ► pp. 286 ff.
Cotos, Elena & Yoo-Ree Chung
2019. Functional language in curriculum genres: Implications for testing international teaching assistants. Journal of English for Academic Purposes 41 ► pp. 100766 ff.
Lederer, Jenny
2019. Lexico-grammatical alignment in metaphor construal. Cognitive Linguistics 30:1 ► pp. 165 ff.
Sidorov, Grigori
2019. Syntactic n-grams: The Concept. In Syntactic n-grams in Computational Linguistics [SpringerBriefs in Computer Science, ], ► pp. 47 ff.
Tang, Peng, Weidong Qiu, Min Yan, Zheng Huang, Shuang Chen & Huijuan Lian
2019. 2019 IEEE Fourth International Conference on Data Science in Cyberspace (DSC), ► pp. 300 ff.
2019. Attributed network representation learning via DeepWalk. Intelligent Data Analysis 23:4 ► pp. 877 ff.
Yu, Shikang, Yang Wu, Yurong Song, Guoping Jiang & Xiaoping Su
2019. Application of DeepWalk Based on Hyperbolic Coordinates on Unsupervised Clustering. In Science of Cyber Security [Lecture Notes in Computer Science, 11933], ► pp. 106 ff.
Zhang, Yan
2019. Adversative versus concessive while-clauses in native and learner English texts: A corpus-based systemic functional description. Digital Scholarship in the Humanities
2019. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ► pp. 1700 ff.
Aston, Guy
2018. Acquiring the Language of Interpreters: A Corpus-based Approach. In Making Way in Corpus-based Interpreting Studies [New Frontiers in Translation Studies, ], ► pp. 83 ff.
2021. Corpus Linguistics in Legal Discourse. International Journal for the Semiotics of Law - Revue internationale de Sémiotique juridique 34:5 ► pp. 1515 ff.
Huang, Ying & Kate Rose
2018. You, our shareholders: metadiscourse in CEO letters from Chinese and Western banks
. Text & Talk 38:2 ► pp. 167 ff.
Méric, Olivier & Laurent Gautier
2018. Le discours du guide-conférencier comme marqueur d’identité d’une institution touristique : l’apport d’un corpus oral authentique. Éla. Études de linguistique appliquée N° 188:4 ► pp. 443 ff.
Nelson, Robert
2018. How ‘chunky’ is language? Some estimates based on Sinclair's Idiom Principle. Corpora 13:3 ► pp. 431 ff.
Wu, Jingqiao, Yanchun Liang, Xiaoyue Feng & Gang Song
2018. Exploring Trends of Lung Cancer Research Based on Word Representation. In Artificial Intelligence and Mobile Services – AIMS 2018 [Lecture Notes in Computer Science, 10970], ► pp. 199 ff.
Xie, Zhenzhen, Liang Hu, Kuo Zhao, Feng Wang & Junjie Pang
2018. Topology2Vec: Topology Representation Learning For Data Center Networking. IEEE Access 6 ► pp. 33840 ff.
Bondi, Marina
2017. What came to be called: evaluative what and authorial voice in the discourse of history. Text & Talk 37:1
2015. Usage-based recycling or creative exploitation of the shared code? The case of phraseological patterning. Journal of English as a Lingua Franca 4:2 ► pp. 223 ff.
Vetchinnikova, Svetlana
2019. Phraseology and the Advanced Language Learner,
2014. Brainwashing or nurturing positive values: Competing voices in Hong Kong's national education debate. Journal of Pragmatics 74 ► pp. 1 ff.
Forsyth, R. S. & S. Sharoff
2014. Document dissimilarity within and across languages: A benchmarking study. Literary and Linguistic Computing 29:1 ► pp. 6 ff.
Grabowski, Łukasz
2013. Register Variation Across English Pharmaceutical Texts: A Corpus-driven Study of Keywords, Lexical Bundles and Phrase Frames in Patient Information Leaflets and Summaries of Product Characteristics. Procedia - Social and Behavioral Sciences 95 ► pp. 391 ff.
Grabowski, Łukasz
2015. Keywords and lexical bundles within English pharmaceutical discourse: A corpus-driven description. English for Specific Purposes 38 ► pp. 23 ff.
Huang, Chung-Chi, Mei-Hua Chen, Ping-Che Yang & Jason S. Chang
2013. A Computer-Assisted Translation and Writing System. ACM Transactions on Asian Language Information Processing 12:4 ► pp. 1 ff.
Legallois, Dominique & Agnès Tutin
2013. Présentation : Vers une extension du domaine de la phraséologie. Langages n° 189:1 ► pp. 3 ff.
Lei, L.
2013. Exploring Corpus Linguistics: Language in Action. ELT Journal 67:4 ► pp. 503 ff.
McIntyre, Dan
2013. The year’s work in stylistics 2012. Language and Literature: International Journal of Stylistics 22:4 ► pp. 333 ff.
Nam Kil Im
2013. A Study on Analysis Units of Korean Formulaic Expressions: Focused on the Comparison between Morpheme-Based Analysis and Ecel-Based Analysis. Discourse and Cognition 20:1 ► pp. 113 ff.
Sidorov, Grigori, Francisco Velasquez, Efstathios Stamatatos, Alexander Gelbukh & Liliana Chanona-Hernández
2013. Syntactic Dependency-Based N-grams: More Evidence of Usefulness in Classification. In Computational Linguistics and Intelligent Text Processing [Lecture Notes in Computer Science, 7816], ► pp. 13 ff.
홍신철
2013. An N-gram Analysis of Korean English Learners' Writing. Korean Journal of English Language and Linguistics 13:2 ► pp. 313 ff.
Hyland, Ken
2012. Bundles in Academic Discourse. Annual Review of Applied Linguistics 32 ► pp. 150 ff.
Nerlich, Brigitte, Richard Forsyth & David Clarke
2012. Climate in the News: How Differences in Media Discourse Between the US and UK Reflect National Priorities. Environmental Communication 6:1 ► pp. 44 ff.
Philip, Gill
2012. Technology and Phrases. In The Encyclopedia of Applied Linguistics,
Qin, Ying
2012. 2012 IEEE 2nd International Conference on Cloud Computing and Intelligence Systems, ► pp. 1398 ff.
Schmitt, Norbert
2012. Formulaic Language and Collocation. In The Encyclopedia of Applied Linguistics,
Winnie Cheng & Maggie Leung
2012. Exploring phraseological variations by concgramming: The realisation of complete patterns of variations. Linguistic Research 29:3 ► pp. 617 ff.
2011. Observations on the phraseology of academic writing: local patterns - local meanings?. In The Phraseological View of Language, ► pp. 211 ff.
Römer, Ute
2017. Language assessment and the inseparability of lexis and grammar: Focus on the construct of speaking. Language Testing 34:4 ► pp. 477 ff.
Patton, Robert M., Barbara G. Beckerman, Thomas E. Potok & Jim N. Treadwell
2010. Proceedings of the 12th annual conference companion on Genetic and evolutionary computation, ► pp. 1931 ff.
Siewierska, Anna, Jiajin Xu & Richard Xiao
2010. Bang-le yi ge da mang (offered a big helping hand): a corpus study of the splittable compounds in spoken and written Chinese. Language Sciences 32:4 ► pp. 464 ff.
Yu, Liang-Chih, Chung-Hsien Wu, Ru-Yng Chang, Chao-Hong Liu & Eduard Hovy
2010. Annotation and verification of sense pools in OntoNotes. Information Processing & Management 46:4 ► pp. 436 ff.
Cheng, Winnie, Chris Greaves, John McH. Sinclair & Martin Warren
2009. Uncovering the Extent of the Phraseological Tendency: Towards a Systematic Analysis of Concgrams. Applied Linguistics 30:2 ► pp. 236 ff.
Durrant, Philip
2009. Investigating the viability of a collocation list for students of English for academic purposes. English for Specific Purposes 28:3 ► pp. 157 ff.
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.