Article published In: English World-Wide
Vol. 46:1 (2025) ► pp.93–121
Treebanks and World Englishes
A Singapore English perspective
Published online: 21 March 2025
https://doi.org/10.1075/eww.23069.hua
https://doi.org/10.1075/eww.23069.hua
Abstract
Treebanks (parsed corpora) play an important role in linguistic research, but creating high-quality parses can be
very labor-intensive. This paper discusses the prospects of creating such parses in the context of New Englishes and what kinds of
research insights parses can deliver. We present Singapore English as a case study. We suggest that despite the many
contact-derived lexical and grammatical properties of Singapore English, it is quite feasible to apply an off-the-shelf American
English parser to generate parses of Singapore English. In addition, we present an exploratory analysis of noun phrases in a
Singapore English treebank, to illustrate the potential of parses and treebanks in research on World Englishes.
Article outline
- 1.Introduction
- 2.Background
- 2.1Singapore English
- 2.1.1Borrowing of content and function words
- 2.1.2English-origin words with novel uses
- 2.1.3Borrowing of constructions distinguished by their syntax
- 2.2Parser software
- 2.3Applying the Stanford Parser to SgE corpus material
- 2.1Singapore English
- 3.Analysis of Stanford Parser performance
- 3.1Parsing content word borrowings
- 3.2Parsing grammatical borrowings
- 3.2.1English-origin words with novel uses
- 3.2.2Function word borrowings
- 3.2.3Constructions with only distinctive syntax
- 3.3Improving parsing accuracies
- 3.3.1Providing the parser with part of speech tags
- 3.3.2Training the parser on hand-corrected parses
- 3.4Interim summary
- 4.An exploratory analysis of noun phrases in SgE
- 4.1Data and analysis
- 4.2Results
- 4.3Future research directions using parses
- 5.Conclusion
- Acknowledgments
- Notes
Sources References
References (49)
Bird, Steven, Ewan Klein, and Edward Loper. 2009. Natural
Language Processing with Python. O’Reilly Media Inc.
Greenbaum, Sidney, and Gerald Nelson. 1996. “The
International Corpus of English (ICE) project.” World
Englishes 151: 3–15.
Marcus, Mitchell P., Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. “Building
a large annotated corpus of English: The Penn Treebank.” Computational
Linguistics 191: 313–330.
. 2001. “The
Origins of Empty Categories in Singapore English”. Journal of Pidgin and Creole
Languages 161: 275–319.
. 2009. “One
in Singapore English”. Studies in
Language 331: 338–365.
. 2014. “Got
in Singapore English”. In Eugene Green, and Charles F. Meyer, eds. The
Variability of Current World Englishes. Berlin: De Gruyter Mouton, 147–165.
. 2015. The
Making of Vernacular Singapore English: System, Transfer and
Filter. Cambridge: Cambridge University Press.
Bao, Zhiming, and Hui Min Lye. 2005. “Systemic
Transfer, Topic Prominence, and the Bare Conditional in Singapore English”. Journal of Pidgin
and Creole Languages 201: 269–291.
Crewe, William J. 1977. Singapore English and Standard English:
Exercises in Awareness. Singapore: Eastern Universities Press.
Davies, Mark, and Robert Fuchs. 2015. “Expanding
Horizons in the Study of World Englishes with the 1.9 Billion Word Global Web-based English Corpus
(GloWbE)”. English
World-Wide 361: 1–28.
Gardner, Matt, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson F. Liu, Matthew Peters, Michael Schmitz, and Luke Zettlemoyer. 2018. “AllenNLP:
A Deep Semantic Natural Language Processing Platform”. Proceedings of Workshop for NLP Open
Source Software (NLP-OSS), 1–6.
Gonzales, Wilkinson Daniel Wong. 2023. “Broadening Horizons in the
Diachronic and Sociolinguistic Study of Philippine English with the Twitter Corpus of Philippine Englishes
(TCOPE)”. English
World-Wide 441: 403–434.
Gonzales, Wilkinson Daniel Wong, Mie Hiramoto, Jakob R. E. Leimgruber, and Jun Jie Lim. 2023. “The
Corpus of Singapore English Messages (CoSEM)”. World
Englishes 421: 371–388.
Greenbaum, Sidney. 1988. “A
Proposal for an International Computerized Corpus of English”. World
Englishes 71: 315.
Gupta, Anthea Fraser. 1992. “The Pragmatic Particles of
Singapore Colloquial English”. Journal of
Pragmatics 181: 31–57.
Hiramoto, Mie, and Yosuke Sato. 2012. “Got-interrogatives
and Answers in Colloquial Singapore English: Aktionsart and Stativity”. World
Englishes 311: 186–195.
Honnibal, Matthew, and Mark Johnson. 2015. “An
Improved Non-monotonic Transition System for Dependency Parsing”. Proceedings of the 2015
Conference on Empirical Methods in Natural Language
Processing, 1373–1378.
Huang, Nick. 2018. “Control
Complements in Mandarin Chinese: Implications for Restructuring and the Chinese Finiteness
Debate”. Journal of East Asian
Linguistics 271: 347–376.
Kirk, John M. 2017. “Developments in the Spoken
Component of ICE Corpora”. World
Englishes 361: 371–386.
Kirk, John M., and Gerald Nelson. 2018. “The
International Corpus of English project: A Progress Report”. World
Englishes 371: 697–716.
Kitaev, Nikita, Steven Cao, and Dan Klein. 2019. “Multilingual
Constituency Parsing with Self-attention and Pre-training”. Proceedings of the 57th Annual
Meeting of the Association for Computational
Linguistics, 3499–3505.
Klein, Dan, and Christopher D. Manning. 2003. “Accurate
Unlexicalized Parsing”. Proceedings of the 41st Meeting of the Association for Computational
Linguistics, 423–430.
Lee, Nala H., Ai Ping Ling, and Hiroki Nomoto. 2009. “Colloquial
Singapore English got: Functions and Substratal Influences”. World
Englishes 281: 293–318.
Lee, Si Kai. 2022. “On Agreement-drop in
Singlish: Topics Never Agree”. Glossa: A Journal of General
Linguistics 451: 1–27.
Leimgruber, Jakob. R. E. 2013. Singapore English: Structure,
Variation and Usage. Cambridge: Cambridge University Press.
Li, Charles N., and Sandra A. Thompson. 1976. “Subject
and Topic: A New Typology of Language”. In Charles N. Li, ed. Subject and
Topic. New York: Academic Press, 457–489.
Lim, Lisa. 2007. “Mergers
and Acquisitions: On the Ages and Origins of Singapore English Particles”. World
Englishes 271: 446–473.
Lin, Li. 2022. “A
Corpus-based Grammar of Singapore English: Description and Change”. Ph.D.
Dissertation, National University of Singapore.
Lin, Li, Kunmei Han, Jia Wen Hing, Luwen Cao, Vincent Ooi, Nick Huang, and Zhiming Bao. 2023. “Tagging
Singapore English”. World
Englishes 421: 624–641.
Nelson, Gerald, Sean A. Wallis, and Bas Aarts. 2002. Exploring
Natural Language: Working with the British Component of the International Corpus of
English. Amsterdam: John Benjamins.
Platt, John T. 1975. “The Singapore English Speech
Continuum and its Basilect ‘Singlish’ as a ‘Creoloid’”. Anthropological
Linguistics 171: 363–374.
Qi, Peng, Yuhao Zhang, Yuhui Zhang, Jason Bolton, and Christopher D. Manning. 2020. “Stanza:
A Python Natural Language Processing Toolkit for Many Human Languages”. Association for
Computational Linguistics (ACL) System
Demonstrations, 101–108.
Sato, Yosuke. 2016. “Remarks
on the Parameters of Argument Ellipsis: A New Perspective From Singapore
English”. Syntax 191: 392–411.
Sekine, Satoshi, and Michael John Collins. 2013. “Evalb”. Available
at [URL]
Tay, Mary W. J. 1979. “The Uses, Users and Features
of English in Singapore”. In Jack C. Richards ed. New
Varieties of Englishes. Singapore: SEAMEO Regional Language Centre, 91–111.
Teo, Ming Chew. 2020. Crosslinguistic Influence in
Singapore English: Linguistic and Social
Aspects. London: Routledge.
Wallis, Sean. A., and Gerald Nelson. 2000. “Exploiting
Fuzzy Tree Fragments in the Investigation of Parsed Corpora”. Literary and Linguistic
Computing 151: 339–361.
Wee, Lionel. 2018. The
Singlish Controversy: Language, Culture, and Identity in a Globalizing
World. Cambridge: Cambridge University Press.
Ziegeler, Debra. 2000. Hypothetical
Modality: Grammaticalization in an L2
Dialect. Amsterdam: John Benjamins.
Cited by (1)
Cited by one other publication
Coats, Steven, Carmelo Alessandro Basile, Cameron Morin & Robert Fuchs
2025. The YouTube corpus of Singapore English podcasts. English World-Wide. A Journal of Varieties of English
This list is based on CrossRef data as of 24 november 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
