Article published In: International Journal of Corpus Linguistics
Vol. 24:2 (2019) ► pp.169–201
Variation and change in a specialized register
A comparison of random and sociolinguistic sampling outcomes in Desert Island Discs
Published online: 5 August 2019
https://doi.org/10.1075/ijcl.17117.smi
https://doi.org/10.1075/ijcl.17117.smi
Abstract
Corpus-based studies of specialized registers typically sample texts using random methods as far as possible, but
they disregard social characteristics of the speakers/writers. In contrast, in corpus-based studies of conversation and
quantitative sociolinguistic studies, sampling is more typically designed to optimize social representation. To our knowledge,
this study is the first to compare linguistic outcomes from random versus sociolinguistic sampling in a specialized register. Our
data comes from the biographical radio chat show, Desert Island Discs (DID), at different points
in time. We constructed two versions of a DID corpus: a sociolinguistic judgment sample based on guest
demographics, and a random sample. We compare grammatical usage between them using an inductive (‘key POS-tags’) method and close
manual analysis, uncovering some evidence of significant grammatical differences between the samples and differing patterns of
diachronic change. We discuss the implications of our research for corpus design, representativeness and analysis in specialized
registers.
Article outline
- 1.Introduction
- 2.Corpus sampling and representativeness in register-based and sociolinguistic studies
- 3.Data and methods
- 3.1 Desert Island Discs: Register characteristics and suitability for the study
- 3.2Corpus design
- 3.2.1The Socio sample
- i.Gender
- ii.Age
- iii.Occupation
- iv.Education
- 3.2.2The Random sample
- 3.2.1The Socio sample
- 3.3Analytical methods
- 3.3.1Grammatical analysis: Key POS-tag comparisons
- 3.3.2Frequency measures
- 4.Results
- 4.1Overview of synchronic and diachronic key POS-tag comparisons
- 4.2Case study 1: Adverbs
- 4.3Case study 2: Verb tenses
- 5.Discussion
- 6.Conclusion
- Acknowledgements
- Notes
References
References (52)
Aarts, B., Close, J. & Wallis, S. (2013). Choices over time: Methodological issues in investigating current change. In B. Aarts, J. Close, G. Leech & S. Wallis (Eds.), The Verb Phrase in English: Investigating Recent Language Change with Corpora (pp. 14–45). Cambridge: Cambridge University Press.
Anderwald, L. (2002). Negation in Non-standard British English: Gaps, Regularizations, Asymmetries. London/New York, NY: Routledge.
Argamon, S., Koppel, M., Fine, J., & Shimoni, A. R. (2003). Gender, genre, and writing style in formal written texts. Text,
23
(3), 321–346.
Ball, C. N. (1994). Automated text analysis: Cautionary tales. Literary and Linguistic Computing,
9
(4), 295–302.
Balossi, G. (2014). A Corpus Linguistic Approach to Literary Language and Characterization: Virginia Woolf’s The Waves
. Amsterdam/Philadelphia, PA: John Benjamins.
BBC (2004). BBC Charter Review (Radio 4). Retrieved from [URL] (last accessed April 2019).
Bell, P., & van Leeuwen, T. (1994). The Media Interview: Confession, Contest, Conversation. Kensington: University of New South Wales Press.
Biber, D. (1993). Representativeness in corpus design. Literary and Linguistic Computing,
8
(4), 243–257.
(1994). An analytical framework for register studies. In D. Biber & E. Finegan (Eds.) Sociolinguistic Perspectives on Register (pp. 31–56). Oxford: Oxford University Press.
(2006). University Language: A Corpus-based Study of Spoken and Written Registers. Amsterdam/Philadelphia, PA: John Benjamins.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). The Longman Grammar of Spoken and Written English. London: Longman.
Bowie, J., Wallis, S. & Aarts, B. (2013). The perfect in spoken British English. In B. Aarts, J. Close, G. Leech & S. Wallis (Eds.), The Verb Phrase in English: Investigating Recent Language Change with Corpora (pp. 318–52). Cambridge: Cambridge University Press.
Brezina, V., & Meyerhoff, M. (2014). Significant or random? A critical review of sociolinguistic generalisations based on large corpora. International Journal of Corpus Linguistics,
19
(1), 1–28.
Burnard, L. (2007) (Ed.). Reference Guide for the British National Corpus (XML Edition). Retrieved from [URL] (last accessed April 2019).
Butler, S. (1990). Gender Trouble: Feminism and the Subversion of Identity. New York, NY: Routledge.
Corbett, J., & Stuart-Smith, J. (2012). Standard English in Scotland. In R. Hickey (Ed.) Standards of English: Codified Standards around the World (pp. 72–95). Cambridge: Cambridge University Press.
Denscombe, M. (2014). The Good Research Guide for Small-scale Social Research Projects (5th ed.). Maidenhead: Open University Press.
Finegan, E., & Biber, D. (2001). Register variation and social dialect variation: The register axiom. In P. Eckert & J. Rickford (Eds.), Style and Sociolinguistic Variation (pp. 235–67). Cambridge: Cambridge University Press.
Garside, R., & Smith, N. (1997). A hybrid grammatical tagger: CLAWS4. In R. Garside, G. Leech, A. McEnery (Eds.), Corpus Annotation: Linguistic Information from Computer Text Corpora (pp. 102–121). London: Longman.
Gimenez, J. (2000). Business e-mail communication: some emerging tendencies in register. English for Specific Purposes,
19
(3), 237–251.
Gray, B. (2015). Linguistic Variation in Research Articles: When Discipline Tells Only Part of the Story. Amsterdam/Philadelphia, PA: John Benjamins.
Herring, S., & Paolillo, J. C. (2006). Gender and genre variation in weblogs. Journal of Sociolinguistics,
10
(4), 493–459.
Ito, R., & Tagliamonte, S. A. (2003).
Well weird, right dodgy, very strange, really cool: Layering and recycling in English intensifiers. Language in Society,
32
(2), 257–279.
Jucker, A., & Landert, D. (2015). Historical pragmatics and early speech recordings: Diachronic developments in turn-taking and narrative structure in radio talk shows. Journal of Pragmatics,
79
1, 22–39.
Labov, W. (1966). The Social Stratification of English in New York City. Washington, DC: Center for Applied Linguistics.
Love, R., Dembry, C., Hardie, A., Brezina, V., & McEnery, T. (2017). The Spoken BNC2014: Designing and building a spoken corpus of everyday conversations. International Journal of Corpus Linguistics,
22
(3), 319–344.
Macaulay, R. (2005). Talk that Counts: Age, Gender and Social Class Differences in Discourse. Oxford: Oxford University Press.
Magee, S. (2012). Desert Island Discs: 70 Years of Castaways from one of BBC Radio 4’s Best-loved Programmes. London: Bantam.
McEnery, T., & Hardie, A. (2012). Corpus Linguistics: Method, Theory and Practice. Cambridge: Cambridge University Press.
Ndaji, F., Little, J., & Coe, R. (2016). A Comparison of Academic Achievement in Independent and State Schools: Report for the Independent Schools Council. Centre for Evaluation and Monitoring, University of Durham. [URL] (last accessed April 2019).
Nevalainen, T. & Raumolin-Brunberg, H. (Eds.) (2003). Sociolinguistics and Language History: Studies Based on the Corpus of Early English Correspondence. Amsterdam: Rodopi.
O’Keeffe, A. (2005).
You’ve a daughter yourself? A corpus-based look at question forms in an Irish radio phone-in. In A. Barron & K. Schneider (Eds.), The Pragmatics of Irish English (pp. 339–366). Berlin: Mouton de Gruyter.
Rayson, P. (2008). From key words to key semantic domains. International Journal of Corpus Linguistics,
13
(4), 519–549.
Sankoff, D. (2005). Problems of representativeness. In U. Ammon, N. Dittmar, K. J. Mattheier & P. Trudgill (Eds.), Sociolinguistics: An International Handbook of the Science of Language and Society (2nd ed., Vol.
2
1, pp. 998–1002). Berlin: Walter de Gruyter.
Sankoff, D., & Laberge, S. (1978). The linguistic market and the statistical explanation of variability. In D. Sankoff (Ed.), Linguistic Variation: Models and Methods (pp. 239–250). New York, NY: Academic Press.
Sankoff, D., Cedergren, H., Kemp, W., Thibault, P. & Vincent, D. (1989). Montreal French: Language, class and ideology. In R. W. Fasold & D. Schiffrin (Eds.), Language Change and Variation (pp. 107–118). Amsterdam/Philadelphia, PA: Benjamins.
Scannell, P. (1989). Public service broadcasting and modern public life. Media, Culture and Society,
11
(2): 134–166.
Smith, N. & Waters, C. (2018). From broadcast archive to language corpus: Designing and investigating a sociohistorical corpus from Desert Island Discs
. ICAME Journal,
42
1, 167–189.
Tolson, A. (2006). Media Talk: Spoken Discourse on TV and Radio. Edinburgh: Edinburgh University Press.
Cited by (6)
Cited by six other publications
Love, Robbie
Salameh Jiménez, Shima
2024. Linguistic change in the 20th century. In Language Change in the 20th Century [Pragmatics & Beyond New Series, 340], ► pp. 17 ff.
Smith, Nicholas, Cristiano Broccias & Cathleen Waters
Sönning, Lukas
2024. Down-sampling from hierarchically structured corpus data. International Journal of Corpus Linguistics 29:4 ► pp. 507 ff.
Egbert, Jesse & Douglas Biber
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
