ChatGPT as an informant

Mulders, Iris; Ruys, E.G.

doi:10.1075/nb.00015.mul

Article published In: Linguistics in the Netherlands 2024
Edited by Marco Bril and Kristel Doreleijers
[Nota Bene 1:2] 2024
► pp. 242–260

Get fulltext from our e-platform

Download PDF

Download EPUB

ChatGPT as an informant

Iris Mulders | Utrecht University

E.G. Ruys | Utrecht University

Available under the Creative Commons Attribution (CC BY) 4.0 license.

For any use beyond this license, please contact the publisher at rights@benjamins.nl.

Open Access publication of this article was funded through a Transformative Agreement with Utrecht University.

Published online: 24 January 2025

https://doi.org/10.1075/nb.00015.mul

Abstract

While previous machine learning protocols have failed to achieve even observational adequacy in acquiring natural language, generative large language models (LLMs) now produce large amounts of free text with few grammatical errors. This is surprising in view of what is known as “the logical problem of language acquisition”. Given the likely absence of negative evidence in the training process, how would the LLM acquire the information that certain strings are to be avoided as ill-formed? We attempt to employ Dutch-speaking ChatGPT as a linguistic informant by capitalizing on the documented “few shot learning” ability of LLMs. We then investigate whether ChatGPT has acquired familiar island constraints, in particular the CNPC, and compare its performance to that of native speakers. Although descriptive and explanatory adequacy may remain out of reach, initial results indicate that ChatGPT performs well over chance in detecting island violations.

Keywords: LLM, GPT, island conditions, negative evidence, Poverty of the Stimulus, CNPC, Adjunct Islands, island constraints, wh-movement

Article outline

1.Introduction
2.Test methodology: ChatGPT as an informant
3.Experiment 1: ChatGPT
- 3.1Language models
- 3.2Materials
  - Set 1: Base line examples
  - Set 2: English island violations
  - Set 3: Dutch island violations
- 3.3Procedure
- 3.4Predictions
- 3.5Analysis, results and discussion
4.Experiment 2: Humans versus few-shot GPT 4
- 4.1Participants
- 4.2Materials
- 4.3Procedure
- 4.4Analysis and results
  - 4.4.1Humans
  - 4.4.2Humans versus GPT 4 with a few-shot prompt
5.Conclusions
Supplementary materials and data archive
Acknowledgements
Notes
References

References (33)

References

Baker, C. LeRoy. 1979. Syntactic theory and the projection problem. Linguistic Inquiry 10(4). 533–581.

Bates, Douglas, Martin Maechler, Ben Bolker & Steve Walker. 2015. Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67(1). 1–48.

Berwick, Robert C., Paul Pietroski, Beracah Yankama & Noam Chomsky. 2011. Poverty of the stimulus revisited. Cognitive Science 351. 1207–1242.

Braine, Martin D. S. 1971. On two types of models of the internalization of grammars. In D. I. Slobin (ed.), The Ontogenesis of Grammar: A Theoretical Symposium. New York: Academic Press.

Brown, Tom B., Benjamin Mann, Nick Ryder, Malanie Subbiah, Jared D. Kaplan et al. 2020. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan & H. Lin (eds.), Advances in Neural Information Processing Systems 331.

Chomsky, Noam. 2013. Problems of projection. Lingua 1301. 33–49.

. 2016. What kind of creatures are we? Columbia University Press.

. 2023. The false promise of ChatGPT. The New York Times, 2023-03-08.

Clark, Alexander & Eyraud, Rémi. 2007. Polynomial identification in the limit of substitutable context-free languages. Journal of Machine Learning Research 81. 1725–1745.

Cowie, Fiona. 1997. The logical problem of language acquisition. Synthese 1111. 17–51.

Dentella, Vittoria, Fritz Günther & Evelina Leivada. 2023. Systematic testing of three Language Models reveals low language accuracy, absence of response stability, and a yes-response bias. Proceedings of the National Academy of Sciences, 120(51).

Fox, John & Sanford Weisberg. 2019. An R Companion to Applied Regression, Third edition. Sage, Thousand Oaks CA. [URL]

Hamaker, Ellen L., Pascal van Hattum, Rebecca M. Kuiper & Herbert Hoijtink. 2011. Model selection based on information criteria in multilevel modeling. Handbook of advanced multilevel analysis, 231–255.

Hornstein, Norbert & David Lightfoot (eds.). 1981. Explanation in Linguistics, The Logical Problem of Language Acquisition. Longman.

Hu, Jennifer, Kyle Mahowald, Gary Lupyan, Anna Ivanova & Roger Levy. 2024. Language models align with human judgments on key grammatical constructions.

Huang, C.-T. James. 1982. Logical Relations in Chinese and the Theory of Grammar. Diss. MIT.

Huijbregts, Riny. 2008. Linguistic Argumentation and Poverty of the Stimulus Arguments. Ms., Utrecht University.

Kojima, Takeshi, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo & Yusuke Iwasawa. 2022. Large Language Models are zero-shot reasoners.

Laurence, Stephen & Eric Margolis. 2001. The poverty of the stimulus argument. The British Journal for the Philosophy of Science 52(2). 217–276.

Lenth, Russell V. 2023. emmeans: Estimated Marginal Means, aka Least-Squares Means. R package version 1.9.0, [URL]

OpenAI. 2023. GPT-4 Technical Report.

Ouyang, Long et al. 2022. Training language models to follow instructions with human feedback.

Ozaki, Satoru, Dan Yurovsky & Lori Levin. 2022. How well do LSTM language models learn filler-gap dependencies? Proceedings of the Society for Computation in Linguistics (SCiL) 5(1). 76–88.

Pinheiro, José, Douglas Bates & R Core Team. 2023. nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1–164. [URL]

Pinheiro, José & Douglas Bates. 2000. Mixed-Effects Models in S and S-PLUS. New York: Springer.

Pinker, Steven. 1986. Productivity and conservatism in language acquisition. In W. Demopoulos & A. Marras (eds.), Language Learning and Concept Acquisition: Foundational Issues, 54–79. Norwood, NJ: Ablex.

Posit team. 2024. RStudio: Integrated Development Environment for R. Boston, MA: PBC. [URL]

R Core Team. 2023. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. [URL]

Reali, Florencia & Morten H. Christiansen. 2005. Uncovering the richness of the stimulus: Structure dependence and indirect statistical evidence. Cognitive Science 291. 1007–1028.

Ross, John R. 1967. Constraints on variables in syntax. Doctoral dissertation, MIT, Cambridge, MA. Reprinted as Infinite Syntax! Norwood, NJ: Ablex, 1986.

Sprouse, Jon, Matt Wagers & Colin Phillips. 2012. A test of the relation between working-memory capacity and syntactic island effects. Language. 82–123.

Sprouse, Jon & Diogo Almeida. 2017. Design sensitivity and statistical power in acceptability judgment experiments. Glossa: A Journal of General Linguistics 2(1).

Wendler, Chris, Veniamin Veselovsky, Giovanni Monea & Robert West. 2024. Do Llamas work in English? On the latent language of multilingual transformers.