In:Applications of Pattern-driven Methods in Corpus Linguistics:
Edited by Joanna Kopaczyk and Jukka Tyrkkö
[Studies in Corpus Linguistics 82] 2018
► pp. 107–130
Chapter 5Constance and variability
Using PoS-grams to find phraseologies in the language of newspapers
Published online: 13 March 2018
https://doi.org/10.1075/scl.82.05pin
https://doi.org/10.1075/scl.82.05pin
Abstract
This paper describes the use of a corpus-driven methodology, the retrieval of part-of-speech-grams (PoS-grams), which is extremely effective for the discovery of phraseologies that might otherwise remain hidden. The PoS-gram is a string of part-of-speech categories (Stubbs 2007: 91), the tokens of which are strings of words that have been annotated with these PoS tags. A list of PoS-grams retrieved from a sample corpus can be compared with that from a reference corpus. Statistically significant items are further analysed to identify recurrent patterns and potential phraseologies. The utility of PoS-grams will be illustrated by way of analysis of a one million token corpus composed of texts from ten sections of The Guardian, the Sassari Newspaper Article Corpus (SNAC).
Keywords: PoS-grams, phraseology, journalism, corpus-driven
Article outline
- 1.Introduction
- 2.Materials and methods
- 3.Results and discussion
- 3.1Travel PoS-grams
- 3.2Crime PoS-grams
- 3.3Obituaries PoS-grams
- 4.Conclusions
Notes References Appendix
References (29)
Baron, Alistair, Rayson, Paul, & Archer, Dawn. 2009. Word frequency and key word statistics in historical corpus linguistics. Anglistik: International Journal of English Studies 20(1): 41–67.
Biber, Douglas. 2009. A corpus-driven approach to formulaic language: Multi-word patterns in speech and writing. International Journal of Corpus Linguistics 14(3): 381–417.
Biber, Douglas & Barbieri, Federica. 2007. Lexical bundles in university spoken and written registers. English for Specific Purposes 26(3): 263–286.
Biber, Douglas, Conrad, Susan & Cortes, Viviana. 2004.
If you look at…: Lexical bundles in university teaching and textbooks. Applied Linguistics 25(3): 371–405.
Biber, Douglas, Johansson, Stig, Leech, Geoffrey, Conrad Susan, & Finegan, Edward. 1999. Longman Grammar of Spoken and Written English. Harlow: Pearson Education.
Cheng, Winnie, Greaves, Chris, Sinclair, John McH., & Warren, Martin. 2009. Uncovering the extent of the phraseological tendency: Towards a systematic analysis of Concgrams. Applied Linguistics 30(2): 236–252.
Cheng, Winnie, Greaves, Chris & Warren, Martin. 2006. From n-gram to skipgram to concgram. International Journal of Corpus Linguistics 11(4): 411–433
D’hondt, Eva K. L., Verberne, Suzan, Weber, Niklas, Koster, Kees & Boves, Lou. 2012. Using skipgrams and PoS-based feature selection for patent classification. Computational Linguistics in the Netherlands Journal 2: 52–70.
Fletcher, William. 2002–2007. kfNgram. Annapolis MD: USNA. <[URL]> (10 June 2016).
Francis, Gill. 1993. A corpus-driven approach to grammar – principles, methods and examples. In Text and Technology. In Honour of John Sinclair, Mona Baker, Gill Francis & Elena Tognini-Bonelli (eds), 137–156. Amsterdam: John Benjamins.
Francis, Gill, Hunston, Susan & Manning, Elizabeth. 1998. Grammar Patterns, 2: Nouns and Adjectives. London: HarperCollins.
Gray, Bethany & Biber, Douglas. 2013. Lexical Frames in Academic Prose and Conversation. International Journal of Corpus Linguistics 18(1): 109–135.
Greaves, Chris & Warren, Martin. 2010. What can a corpus tell us about multi-word units? In The Routledge Handbook of Corpus Linguistics, Anne O’Keeffe & Michael McCarthy (eds), 212–226. London: Routledge.
Hunston, Susan & Francis, Gill. 2000. Pattern Grammar: A Corpus Driven Approach to the Lexical Grammar of English [Studies in Corpus Linguistics 4]. Amsterdam: John Benjamins.
Hunston, Susan & Sinclair, John McH. 2000. A local grammar of evaluation. In Evaluation in Text. Authorial Stance and the Construction of Discourse, Susan Hunston & Geoff Thompson (eds), 74–101. Oxford: OUP.
Hyland, Ken. 2008. As can be seen: Lexical bundles and disciplinary variation. English for Specific Purposes 27: 4–21.
Kopaczyk, Joanna. 2013. The Legal Language of Scottish Burghs. Standardization and Lexical Bundles 1380–1560. Oxford: OUP.
Martin, Jim R. & White, Peter R. R. 2005. The Language of Evaluation. Houndmills: Palgrave Macmillan.
Morley, Barry & Sift, Patricia. 2006. Towards the automatic identification of directive speech acts. In Corpus-based Studies of Diachronic English [Linguistic Insights 31], Roberta Facchinetti & Matt Rissanen (eds), 95–112. Bern: Peter Lang.
Philip, Gill. 2008. Reassessing the canon: ‘Fixed’ phrases in general reference corpora. In Phraseology. An Interdisciplinary Perspective, Sylviane Granger & Fanny Meunier (eds), 95–108. Amsterdam: John Benjamins.
Reyes, Antonio & Rosso, Paolo. 2012. Making objective decisions from subjective data: Detecting irony in customer reviews. Decision Support System 53: 754–760.
Spiccia, Carmelo, Augello Agnese & Pilato, Giovanni. 2015. Posgram driven word prediction. In Proceedings of the 7th International Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, Vol. 1 [IC3K 2015], Ana Fred, Jan Dietz, David Aveiro, Kecheng Liu & Joaquim Filipe (eds), 589–596. Lisbon.
Stubbs, Michael. 2007. An example of frequent English phraseology: Distributions, structures and functions. In Corpus Linguistics 25 Years on, Roberta Facchinetti (ed.), 89–105. Amsterdam: Rodopi.
Stubbs, Michael & Barth, Isabel. 2003. Using recurrent phrases as text-type discriminators: A quantitative method and some findings. Functions of Language 10(1): 61–104.
Cited by (2)
Cited by two other publications
Drury, Brett & Samuel Morais Drury
Clarke, Isobelle, Tony McEnery & Gavin Brookes
2021. Multiple Correspondence Analysis, newspaper discourse and subregister. Register Studies 3:1 ► pp. 144 ff.
This list is based on CrossRef data as of 1 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
