Article published In: International Journal of Corpus Linguistics
Vol. 18:4 (2013) ► pp.506–535
A new computing method for extracting contiguous phraseological sequences from academic text corpora
Published online: 5 December 2013
https://doi.org/10.1075/ijcl.18.4.03wei
https://doi.org/10.1075/ijcl.18.4.03wei
This study aims to develop a new computing method for extracting contiguous phraseological sequences (PSs) of various lengths from academic text corpora by measuring internal associations of n-grams. We construct a new normalizing algorithm of probability-weighted average for refining the MI measure and enhancing precision in extracting PSs from corpora. This computing method is applied to the data in a medium-sized text corpus of academic English. Results indicate that the resultant new MI measure can provide statistics which better reveal internal associations within an n-gram, regardless of size. Lexico-grammatical sequences extracted with this method are more complete and less arbitrary in terms of grammar and semantics. The method can be applied to treating a variety of linguistic phenomenon, ranging from well-established phrases to likely phrasal entities, thus having potentially practical applications in corpus-based studies of phraseology and natural language processing.
Cited by (15)
Cited by 15 other publications
Li, Jingjie & Wenjie Hu
Zheng, Lujie, Sheena Kaur & Azlin Zaiti Zainal
Hu, Fumao
Zhou, Qihong & Li Mou
Hsu, Chan-Chia & Shu-Kai Hsieh
Chen, Alvin C.-H.
Polio, Charlene & Hyung-Jo Yoon
Chen, Alvin Cheng‐Hsien
García Salido, Marcos, Marcos Garcia & Margarita Alonso-Ramos
Dobrovoljc, Kaja
2017. Multi-word discourse markers and their corpus-driven identification. International Journal of Corpus Linguistics 22:4 ► pp. 551 ff.
DUNN, JONATHAN
Jeaco, Stephen
Jeaco, Stephen
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
