Boosting LLM performance with generative question-answer Pairs via Wh-transformation

doi:10.1075/consl.24039.wan

Article In: Concentric: Online-First Articles

Boosting LLM performance with generative question-answer Pairs via Wh-transformation

Wen-Jet Peter Wang | National Tsing Hua University

Chen-Sheng Luther Liu | National Yang Ming Chiao Tung University

This content is being prepared for publication; it may be subject to changes.

Abstract

This study explores methods to enhance the performance of offline Large Language Models (LLMs) using generative question-answer (QA) pairs. Existing research highlights the effectiveness of example-based prompts and QA pairs in improving LLM robustness and contextual understanding (, ). However, generating domain-specific QA pairs remains challenging due to the scarcity of datasets across diverse industrial sectors. To address this issue, we advance an innovative and adaptive approach that employs Generative Grammar ( et seq.) to convert industry-specific statements into questions, thereby facilitating QA pair creation. We compare the efficacy of this method with that of LLM-generated QA pairs. Our proposed approach not only reduces the labor-intensive process typically associated with prompt engineering but also provides a transparent and systematic framework for question generation through controlled wh-movement transformations. Initial findings indicate that QA pairs generated via these transformational rules substantially enhance LLM performance in industrial chatbot applications by enriching contextual information and highlighting promising directions for future LLM research and downstream applications.

Keywords: question-answer pairs, generative grammar, wh-transformation

Article outline

1.Introduction
2.Empirical challenges in question-generation by LLMs
3.Proposal
- 3.1The TR of wh-questions
- 3.2Articut and Loki
- 3.3Procedure
- 3.4Experiment results
- 3.5Discussion
4.Concluding remarks
Notes
List of abbreviations
References

References (28)

References

Chen, Shuangshuang. 2024. Resolving Chinese anaphora with Chatgpt. Proceedings of the 2024 International Conference on Asian Language Processing (IALP), ed. by Rui Liu, Lei Wang, Feilong Bao, Yanfeng Lu, Cunhang Fan and Minghui Dong, 31–36. New York: Institute of Electrical and Electronics Engineers.

Cheng, Lai-Shen, Lisa. 1991. On the Typology of Wh-questions. Doctoral Dissertation, Massachusetts Institute of Technology, Cambridge, MA.

Chomsky, Noam. 1957. Syntactic Structures. The Hague, Netherlands: Mouton.

. 1970. Remarks on nominalization. Readings in English Transformational Grammar, ed. by Roderick Jacobs and Peter Rosenbaum, 184–221. Washington, D.C.: Georgetown UP.

. 1973. Conditions on transformations. A Festschrift for Morris Halle, ed. by Stephen R. Anderson and Paul Kiparsky, 232–286. New York: Holt, Rinehart & Winston.

. 1993. A Minimalist Program for Linguistic Theory. Cambridge, MA: The MIT Press.

Chowdhury, Arijit Ghosh, and Aman Chadha. 2024. Generative data augmentation using LLMs improves distributional robustness in question answering. Proceedings of the 18^th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, ed. by Neele Falk, Sara Papi and Mike Zhang, 258–265. Stroudsburg, PA: Association for Computational Linguistics.

Chung, Meng-Hsuan, and Chao-Ting Tim Chou. 2025. Climbing towards the NLU of the universal reading of shei ‘who’. Concentric 51.2:303–348.

Huang, Cheng-Teh James. 1982. Logical Relations in Chinese and the Theory of Grammar. Doctoral Dissertation, Massachusetts Institute of Technology, Cambridge, MA.

Jackendoff, Ray. 1977. X-bar Syntax: A Study of Phrase Structure. Cambridge, MA: The MIT Press.

Li, Aijun, 2002, Chinese prosody and prosodic labeling of spontaneous speech. Proceedings of Speech Prosody 2002, ed. by Bernard Bel and Isabelle Marlien, 39–46. Aix-en-Provence, France: Laboratoire Parole et Langage, Université de Provence.

Li, Shengnan, Qu Weiguang, Wei Tingxin, Zhou Junsheng, Gu Yanhui, and Li Bin. 2021. A survey of Chinese anaphora resolution. Artificial Intelligence and Security: 7th International Conference (ICAIS 2021), ed. by Xingming Sun, Xiaorui Zhang, Zhihua Xia and Elisa Bertino, 180–192. Dublin, Ireland: Springer International Publishing.

Li, Yen-Hui Audrey. 1992. Indefinite wh in Mandarin Chinese. Journal of East Asian Linguistics 1.2:125–155.

Lin, Jo-Wang. 1996. Polarity Licensing and Wh-phrase Quantification in Chinese. Doctoral dissertation, University of Massachusetts at Amherst, Amherst, MA.

. 1998. On existential polarity wh-phrases in Chinese. Journal of East Asian Linguistics 7.3:219–255.

Lu, Sin-En, Bo-Han Lu, Chao-Yi Lu, and Richard Tzong-Han Tsai. 2022. Exploring methods for building dialects-Mandarin code-mixing corpora: A case study in Taiwanese Hokkien. Findings of the Association for Computational Linguistics (EMNLP-2022), ed. by Yoav Goldberg, Zornitsa Kozareva and Yue Zhang, 6287–6305. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics.

Nunan, David. 1993. Introducing Discourse Analysis. London: Penquin Group.

Pater, Joe. 2019. Generative linguistics and neural networks at 60: Foundation, friction, and fusion. Language 95.1:41–74.

Rajpurkar, Pranav, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. Squad: 100,000+ questions for machine comprehension of text. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP-2016), ed. by Jian Su, Kevin Duh and Xavier Carreras, 2383–2392. Austin, TX: Association for Computational Linguistics.

Stowell, Tim. 1981. Origins of Phrase Structure. Doctoral dissertation, Massachusetts Institute of Technology, Cambridge, MA.

Sukthanker, Rhea, Poria Soujanya, Cambria Erik, and Thirunavukarasu Ramkumar. 2020. Anaphora and coreference resolution: A review. Information Fusion 59:139–162.

Takahashi, Kosuke, Takahiro Omi, Kosuke Arima, and Tatsuya Ishigaki. 2023. Training Generative Question-answering on Synthetic Data Obtained from an Instruct-tuned Model. Retrieved September 27, 2024, from [URL]

Tsai, Wei-Tien Dylan. 1994. On Economizing the Theory of A-Bar Dependencies. Doctoral dissertation, Massachusetts Institute of Technology, Cambridge, MA.

Wang, Wen-jet, Chia-jung Chen, Chia-ming Lee, Chien-yu Lai, and Hsin-hung Lin. 2019a. Articut: Chinese Word Segmentation and POS Tagging System (Version 274) [Computer program]. Retrieved October 3, 2024 from [URL]

. 2019b. Linguistics-Oriented Keyword Interface NLU System (Version 4.0) [Computer program]. Retrieved October 3, 2024 from [URL]

Wolfram, Stephen. 1985. Analytical and Empirical Mathematics with Computers (Report No. 8540). Princeton, NJ: The Institute for Advanced Study.

Zhou, Lexin, Wout Schellaert, Fernando Martínez-Plumed, Yael Moros-Daval, Cèsar Ferri, and José Hernández-Orallo. 2024. Larger and more instructable language models become less reliable. Nature 6341:61–68.

Zhu, Peide, Zhen Wang, Claudia Hauff, Jie Yang, and Avishek Anand. 2022. Answer quality aware aggregation for extractive QA crowdsourcing. Findings of the Association for Computational Linguistics (EMNLP 2022), ed. by Yoav Goldberg, Zornitsa Kozareva and Yue Zhang, 6147–6159. Stroudsburg, PA: The Association for Computational Linguistics.