Article published In: Journal of Historical Pragmatics
Vol. 26:3 (2025) ► pp.406–437
Assessing the potential of using large language models for pragmatic annotation of historical texts
A case of epistemic stance in Early Modern English
Published online: 4 December 2025
https://doi.org/10.1075/jhp.25011.hua
https://doi.org/10.1075/jhp.25011.hua
Abstract
This study investigates the viability of using large language models (llms) to conduct pragmatic
annotations of historical texts. The investigation employs a small corpus of witness depositions and compares Claude 3.5 Sonnet —
an llm that excels in reasoning over text — with two human annotators over their performance in the pragmatic annotation
of Early Modern English (emode) texts. The study also compares the model’s annotations on modernised and
original versions of the corpus to explore if emode spelling variations affect its performance. The results
revealed that although the model’s annotations were less satisfactory than human annotators’, it achieved moderate inter-coder
agreement and balanced precision and recall, which is desirable in this particular task by maximising identification without
sacrificing accuracy. Furthermore, the prevalent spelling variations did not significantly impair the model’s ability to recognise
epistemic stance in the original emode texts. Therefore, we propose a human–ai collaboration approach
for historical pragmatic annotation.
Article outline
- 1.Introduction
- 2.Background and literature review
- 2.1Epistemic stance in Early Modern English
- 2.2Challenges in pragmatic analysis of historical text with a corpus linguistic approach
- 2.3Application of large language models to historical texts
- 2.4Pragmatic annotation with large language models
- 2.5Common approaches in prompt engineering
- 3.Methods and procedures
- 3.1Source of data
- 3.2Llm annotation procedure and prompt design
- 3.3Llm annotation protocol
- 4.Results
- 4.1Large language model versus human annotators
- 4.2Performance of Claude 3.5 Sonnet in annotating Early Modern English texts in original spelling
- 4.3Error analysis
- 4.3.1Annotating the epistemic use of modals
- 4.3.2Annotating the epistemic use of the emphatic
- 4.3.3Annotating certainty/likelihood verbs and communication verbs
- 4.3.4Annotating certainty and likelihood adjectives and adverbs
- 5.Discussion
- 6.Conclusion
- Notes
References
References (54)
Anthropic. 2024. “Introducing Claude
3.5 Sonnet”. Anthropic. Published 21 June 2024. Accessed 23 May 2025 at: [URL]
Biber, Douglas. 2004. “Historical
Patterns for the Grammatical Marking of Stance: A Cross-Register Comparison”. Journal of
Historical
Pragmatics 5 (1): 107–136.
Boggel, Sandra. 2009. Metadiscourse
in Middle English and Early Modern English Religious Texts: A Corpus-based Study. Frankfurt am Main: Peter Lang.
Bromhead, Helen. 2009. The
Reign of Truth and Faith: Epistemic Expressions in 16th and 17th Century English. Berlin and New York: Mouton de Gruyter.
Brown, Tom B., Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever and Dario Amodei. 2020. “Language
Models Are Few-shot Learners”. [v4] Wednesday 22 July 2020. arXiv.
Accessed 20 January 2025.
Campesato, Oswald. 2024. Large
Language Models: An Introduction. Boston: Mercury Learning and Information.
Chafe, Wallace L. and Johanna Nichols (eds). 1986. Evidentiality:
The Linguistic Coding of Epistemology. Norwood, New Jersey: Ablex.
Chockalingam, Annamalai, Ankur Patel, Shashank Verma and Tiffany Yeung. 2023. A
Beginner’s Guide to Large Language Models: Part
1. NVIDIA. See: [URL]
Devlin, Jacob, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. 2019. “BERT:
Pre-training of Deep Bidirectional Transformers for Language
Understanding”. In Proceedings of the 2019 Conference of the North
American Chapter of the Association for Computational Linguistics: Human Language
Technologies, Volume 11 (Long and Short
Papers), 4171–4186. Minneapolis, Minnesota: Association for Computational Linguistics.
Dong, Qingxiu, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, Xu Sun, Lei Li and Zhifang Sui. 2024. “A
Survey on In-Context Learning”. In Yaser Al-Onaizan, Mohit Bansal and Yun-Nung Chen (eds), Proceedings
of the 2024 Conference on Empirical Methods in Natural Language
Processing, 1107–1128. Miami, Florida, USA. 12–16 November
2024. Kerrville: Association for Computational Linguistics.
Faller, Martina T. 2002. “Semantics and Pragmatics of
Evidentials in Cuzco Quechua”. (PhD thesis.) Stanford, California: Stanford University. See: [URL]
Fonteyn, Lauren. 2020. “What
about Grammar? Using BERT Embeddings to Explore Functional-Semantic Shifts of Semi-lexical and Grammatical
Constructions”. In Proceedings of the Workshop on Computational
Humanities Research (CHR
2020), volume 27231 of CEUR Workshop
Proceedings, 257–268. See: [URL]
Gao, Yunfan, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang and Haofen Wang. 2024. “Retrieval-Augmented
Generation for Large Language Models: A Survey”. [v5] Wednesday 27 March 2024. arXiv.
Accessed 7 June 2025.
Garside, Roger, Geoffrey Leech and Tony McEnery (eds). 1997. Corpus
Annotation: Linguistic Information from Computer Text Corpora. London and New York: Routledge.
Gisev, Natasa, J. Simon Bell and Timothy F. Chen. 2013. “Interrater
Agreement and Interrater Reliability Key Concepts, Approaches, and Applications”. Research in
Social and Administrative
Pharmacy 9 (3): 330–338.
Giulianelli, Mario, Marco Del Tredici and Raquel Fernández. 2020. “Analysing
Lexical Semantic Change with Contextualised Word
Representations”. In Dan Jurafsky, Joyce Chai, Natalie Schluter and Joel Tetreault (eds), Proceedings
of the 58th Annual Meeting of the Association for Computational
Linguistics, 3960–3973. Online. 5–10 July
2020. Kerrville: Association for Computational Linguistics. See: [URL].
Gray, Bethany, Douglas Biber and Turo Hiltunen. 2011. “The
Expression of Stance in Early (1665–1712) Publications of the Philosophical Transactions and Other Contemporary Medical Prose:
Innovations in a Pioneering Discourse”. In Irma Taavitsainen and Päivi Pahta (eds), Medical
Writing in Early Modern
English, 221–257. Cambridge: Cambridge University Press.
Grund, Peter J. 2012. “The Nature of Knowledge:
Evidence and Evidentiality in the Witness Depositions from the Salem Witch Trials”. American
Speech 87 (1): 7–38.
2017. “Description, Evaluation and
Stance: Exploring the Forms and Functions of Speech Descriptors in Early Modern
English”. Nordic Journal of English
Studies 16 (1): 41–73.
Harju, Anika and Rob van der Goot. 2025. “How
to Age BERT Well: Continuous Training for Historical Language
Adaptation”. In Hansi Hettiarachchi, Tharindu Ranasinghe, Paul Rayson, Ruslan Mitkov, Mohamed Gaber, Damith Premasiri, Fiona Anting Tan and Lasitha Uyangodage (eds), Proceedings
of the First Workshop on Language Models for Low-Resource
Languages, 258–267. Abu Dhabi, UAE. 20 January
2025. Kerrville: Association for Computational Linguistics.
Hiltunen, Turo and Jukka Tyrkkö. 2011. “Verbs
of Knowing: Discursive Practices in Early Modern Vernacular
Medicine”. In Irma Taavitsainen and Päivi Pahta (eds), Medical
Writing in Early Modern
English, 44–73. Cambridge: Cambridge University Press.
Huang, Ding. 2023. “Formulaic
Sequences in Early Modern English: A Corpus-Assisted Historical Pragmatic Study”. (PhD
thesis.) Heidelberg, Germany: Heidelberg University.
Jurafsky, Daniel and James H. Martin. 2025. Speech
and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
with Language Models. (Third edition.) Online manuscript
released 12 January
2025. See: [URL]
Kamath, Uday, Kevin Keenan, Garrett Somers and Sarah Sorenson. 2024. Large
Language Models: A Deep Dive. Bridging Theory and Practice. Cham, Switzerland: Springer.
Kärkkäinen, Elise. 2003. Epistemic
Stance in English Conversation: A Description of Its Interactional Functions, with A Focus on ‘I
think’. Amsterdam: John Benjamins.
Kytö, Merja and Terry Walker. 2006. Guide
to A Corpus of English Dialogues
1560–1760. Uppsala: Acta Universitatis Upsaliensis.
Landert, Daniela. 2024. Methods
in Historical Corpus Pragmatics: Epistemic Stance in Early Modern English. Cambridge and New York: Cambridge University Press.
Landis, J. Richard and Gary G. Koch. 1977. “The
Measurement of Observer Agreement for Categorical
Data”. Biometrics 33 (1): 159–174.
Liu, Zhiwei, Kailai Yang, Tianlin Zhang, Qianqian Xie and Sophia Ananiadou. 2024. “Emollms:
A Series of Emotional Large Language Models and Annotation Tools for Comprehensive Affective
Analysis”. [v2] Tuesday 18 June
2024. arXiv. Accessed 20
January 2025.
Manjavacas, Enrique and Lauren Fonteyn. 2021. “MacBERTh:
Development and Evaluation of a Historically Pre-trained Language Model for English
(1450–1950)”. In Mika Hämäläinen, Khalid Alnajjar, Niko Partanen and Jack Rueter (eds), Proceedings
of the Workshop on Natural Language Processing for Digital Humanities (NLP4DH
2021), 23–36. Online. 19 December 2021. NIT Silchar, India: the Natural Language Processing Association of India (NLPAI). See: [URL]
. 2022. “Adapting
vs. Pre-training Language Models for Historical Languages”. Journal of Data Mining &
Digital Humanities NLP4DH1: 1–19.
Meta AI. 2024. “Introducing Meta
Llama 3: The Most Capable Openly Available llm to Date”. Meta
AI. Published 18 April
2024. Accessed 23 May
2025 at: [URL]
Naveed, Humza, Asad Ullah Khan, Shi Qiu, Muhammad Saqib, Saeed Anwar, Muhammad Usman, Naveed Akhtar, Nick Barnes and Ajmal Mian. 2024. “A
Comprehensive Overview of Large Language Models”. [v10] Thursday 17 October 2024. arXiv.
Accessed 20 January 2025.
Nuyts, Jan. 2000. Epistemic
Modality, Language and Conceptualization: A Cognitive-Pragmatic
Perspective. Amsterdam: John Benjamins.
OpenAI. 2024. “Hello
GPT-4o”. OpenAI. Published May 13, 2024. Accessed 23 May 2025 at: [URL]
. n.d. “Retrieval
Augmented Generation (RAG) and Semantic Search for GPTs”. OpenAI Help
Center. Accessed 7 June
2025 at: [URL]
Qiu, Xipeng, Tianxiang Sun, Yige Xu, Yunfan Shao, Ning Dai and Xuanjing Huang. 2020. “Pre-Trained
Models for Natural Language Processing: A Survey”. Science China Technological
Sciences 631: 1872–1897.
Qwen Team. 2024. “Hello
Qwen2”. Qwen. Published 15 July 2024. Accessed 23 May 2025 at: [URL]
Simon-Vandenbergen, Anne-Marie and Karin Aijmer. 2007. The
Semantic Field of Modal Certainty: A Corpus-based Study of English Adverbs. Berlin and New York: Mouton de Gruyter.
Skansi, Sandro. 2018. Introduction
to Deep Learning: From Logical Calculus to Artificial Intelligence. Cham, Switzerland: Springer.
Squartini, Mario. 2016. “Interactions
between Modality and Other Semantic Categories”. In Jan Nuyts and Johan van der Auwera (eds), The
Oxford Handbook of Modality and
Mood, 50–67. Oxford: Oxford University Press.
Taavitsainen, Irma. 2018. “Historical
Corpus Pragmatics”. In Andreas H. Jucker, Klaus P. Schneider and Wolfram Bublitz (eds), Methods
in Pragmatics, 527–553. Berlin and Boston: De Gruyter Mouton.
Taavitsainen, Irma and Andreas H. Jucker. 2010. “Trends
and Developments in Historical Pragmatics”. In Andreas H. Jucker and Irma Taavitsainen (eds), Historical
Pragmatics, 3–30. Berlin and New York: De Gruyter Mouton.
Tharwat, Alaa. 2021. “Classification
Assessment Methods”. Applied Computing and
Informatics 17 (1): 168–192.
Varnum, Michael E. W., Nicolas Baumard, Mohammad Atari and Kurt Gray. 2024. “Large
Language Models Based on Historical Text Could Offer Informative Tools for Behavioral
Science”. PNAS 121 (42): e2407639121.
Wei, Jason, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le and Denny Zhou. 2022. “Chain-of-Thought
Prompting Elicits Reasoning in Large Language Models”. On the 36th Conference on Neural
Information Processing Systems (NeurIPS 2022). New Orleans, USA and
online. [v6] 10 January
2023. arXiv. Accessed 18
January 2025.
Whitt, Richard J. 2023. “Epistemic Space and Key
Concepts in Early and Late Modern Medical Discourse: An Exploration of Two Genres”. English
Language and
Linguistics 27 (2): 241–269.
Yao, Ben, Yazhou Zhang, Qiuchi Li and Jing Qin. 2024a. “Is
Sarcasm Detection a Step-by-Step Reasoning Process in Large Language
Models?” [v2] 24 August
2024. arXiv. Accessed 17
January 2025.
Yao, Shunyu, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao and Karthik Narasimhan. 2024b. “Tree
of Thoughts: Deliberate Problem Solving with Large Language
Models”. [v2] Sunday 3 December
2023. arXiv. Accessed 17
January 2025.
Yu, Danni, Luyang Li, Hang Su and Matteo Fuoli. 2024. “Assessing
the Potential of llm-assisted Annotation for Corpus-based Pragmatics and Discourse Analysis: The Case of
Apology”. International Journal of Corpus
Linguistics 29 (4): 534–561.
Zhao, Wayne Xin, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie and Ji-Rong Wen. 2024. “A
Survey of Large Language Models”. [v15] Sunday 13 October 2024. arXiv.
Accessed 20 January 2025.
