A new computing method for extracting contiguous phraseological sequences from academic text corpora

Wei, Naixing; Li, Jingjie

doi:10.1075/ijcl.18.4.03wei

Article published In: International Journal of Corpus Linguistics
Vol. 18:4 (2013) ► pp.506–535

Get fulltext from our e-platform

Download PDF

A new computing method for extracting contiguous phraseological sequences from academic text corpora

Naixing Wei | Beihang University

Jingjie Li | Donghua University

Published online: 5 December 2013

https://doi.org/10.1075/ijcl.18.4.03wei

This study aims to develop a new computing method for extracting contiguous phraseological sequences (PSs) of various lengths from academic text corpora by measuring internal associations of n-grams. We construct a new normalizing algorithm of probability-weighted average for refining the MI measure and enhancing precision in extracting PSs from corpora. This computing method is applied to the data in a medium-sized text corpus of academic English. Results indicate that the resultant new MI measure can provide statistics which better reveal internal associations within an n-gram, regardless of size. Lexico-grammatical sequences extracted with this method are more complete and less arbitrary in terms of grammar and semantics. The method can be applied to treating a variety of linguistic phenomenon, ranging from well-established phrases to likely phrasal entities, thus having potentially practical applications in corpus-based studies of phraseology and natural language processing.

Keywords: internal association, n-grams, phraseology, probability-weighted average, pseudo-bigram transformation

Cited by (15)

Cited by 15 other publications

Order by:

Li, Jingjie & Wenjie Hu

2025. Identification of sentence stems characteristic of Chinese learner English writing. Heliyon 11:3 ► pp. e37166 ff.

Zheng, Lujie, Sheena Kaur & Azlin Zaiti Zainal

2025. The influence of working memory and proficiency on phraseological growth: A longitudinal study of adjective-noun combinations in Chinese EFL learners’ argumentative writing. Assessing Writing 64 ► pp. 100915 ff.

Hu, Fumao

2024. Chunk Extraction in Business English Correspondences. In An MT-Oriented Study of Corresponding Lexical Chunks in Business Correspondences from English to Chinese, ► pp. 37 ff.

Zhou, Qihong & Li Mou

2024. A Corpus-Based Study of Lexical Chunks in Chinese Academic Discourse: Extraction, Classification, and Application. In Chinese Lexical Semantics [Lecture Notes in Computer Science, 14515], ► pp. 257 ff.

Hsu, Chan-Chia & Shu-Kai Hsieh

2022. Identifying lexical bundles in Chinese. Language and Linguistics. 語言暨語言學 ► pp. 525 ff.

Chen, Alvin C.-H.

2021. Durational Patterns of Recurrent Multiword Combinations in Mandarin Spontaneous Speech Production. Language and Speech 64:3 ► pp. 742 ff.

Buerki, Andreas

2020. Formulaic Language and Linguistic Change,

Polio, Charlene & Hyung-Jo Yoon

2020. Exploring Multi-Word Combinations as Measures of Linguistic Accuracy in Second Language Writing. In Learner Corpus Research Meets Second Language Acquisition, ► pp. 96 ff.

Chen, Alvin Cheng‐Hsien

2019. Assessing Phraseological Development in Word Sequences of Variable Lengths in Second Language Texts Using Directional Association Measures. Language Learning 69:2 ► pp. 440 ff.

García Salido, Marcos, Marcos Garcia & Margarita Alonso-Ramos

2019. Identifying Lexical Bundles for an Academic Writing Assistant in Spanish. In Computational and Corpus-Based Phraseology [Lecture Notes in Computer Science, 11755], ► pp. 144 ff.

Dobrovoljc, Kaja

2017. Multi-word discourse markers and their corpus-driven identification. International Journal of Corpus Linguistics 22:4 ► pp. 551 ff.

DUNN, JONATHAN

2017. Computational learning of construction grammars. Language and Cognition 9:2 ► pp. 254 ff.

Jeaco, Stephen

2017. Helping Language Learners Put Concordance Data in Context. International Journal of Computer-Assisted Language Learning and Teaching 7:2 ► pp. 22 ff.

Jeaco, Stephen

2020. Helping Language Learners Put Concordance Data in Context. In Language Learning and Literacy, ► pp. 71 ff.

Yoon, Hyung-Jo

2016. Association strength of verb-noun combinations in experienced NS and less experienced NNS writing: Longitudinal and cross-sectional findings. Journal of Second Language Writing 34 ► pp. 42 ff.

This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.