Article In: Concentric: Online-First Articles
Boosting LLM performance with generative question-answer Pairs via Wh-transformation
This content is being prepared for publication; it may be subject to changes.
Abstract
This study explores methods to enhance the performance of offline Large Language Models (LLMs) using generative
question-answer (QA) pairs. Existing research highlights the effectiveness of example-based prompts and QA pairs in improving LLM
robustness and contextual understanding (Takahashi, Kosuke, Takahiro Omi, Kosuke Arima, and Tatsuya Ishigaki. 2023. Training
Generative Question-answering on Synthetic Data Obtained from an Instruct-tuned
Model. Retrieved September
27, 2024, from [URL], Chowdhury, Arijit Ghosh, and Aman Chadha. 2024. Generative
data augmentation using LLMs improves distributional robustness in question
answering. Proceedings of the 18th Conference of the European Chapter of
the Association for Computational Linguistics: Student Research Workshop, ed.
by Neele Falk, Sara Papi and Mike Zhang, 258–265. Stroudsburg, PA: Association for Computational Linguistics. ). However, generating domain-specific QA pairs remains challenging due to the
scarcity of datasets across diverse industrial sectors. To address this issue, we advance an innovative and adaptive approach that
employs Generative Grammar (Chomsky, Noam. 1957. Syntactic
Structures. The Hague, Netherlands: Mouton. et seq.) to convert
industry-specific statements into questions, thereby facilitating QA pair creation. We compare the efficacy of this method with
that of LLM-generated QA pairs. Our proposed approach not only reduces the labor-intensive process typically associated with
prompt engineering but also provides a transparent and systematic framework for question generation through controlled
wh-movement transformations. Initial findings indicate that QA pairs generated via these transformational
rules substantially enhance LLM performance in industrial chatbot applications by enriching contextual information and
highlighting promising directions for future LLM research and downstream applications.
Keywords: question-answer pairs, generative grammar, wh-transformation
Article outline
- 1.Introduction
- 2.Empirical challenges in question-generation by LLMs
- 3.Proposal
- 3.1The TR of wh-questions
- 3.2Articut and Loki
- 3.3Procedure
- 3.4Experiment results
- 3.5Discussion
- 4.Concluding remarks
- Notes
- List of abbreviations
References
References (28)
Chen, Shuangshuang. 2024. Resolving
Chinese anaphora with Chatgpt. Proceedings of the 2024 International Conference on
Asian Language Processing (IALP), ed. by Rui Liu, Lei Wang, Feilong Bao, Yanfeng Lu, Cunhang Fan and Minghui Dong, 31–36. New York: Institute of Electrical and Electronics Engineers.
Cheng, Lai-Shen, Lisa. 1991. On
the Typology of Wh-questions. Doctoral
Dissertation, Massachusetts Institute of Technology, Cambridge, MA.
. 1970. Remarks
on nominalization. Readings in English Transformational
Grammar, ed. by Roderick Jacobs and Peter Rosenbaum, 184–221. Washington, D.C.: Georgetown UP.
. 1973. Conditions
on transformations. A Festschrift for Morris Halle, ed.
by Stephen R. Anderson and Paul Kiparsky, 232–286. New York: Holt, Rinehart & Winston.
Chowdhury, Arijit Ghosh, and Aman Chadha. 2024. Generative
data augmentation using LLMs improves distributional robustness in question
answering. Proceedings of the 18th Conference of the European Chapter of
the Association for Computational Linguistics: Student Research Workshop, ed.
by Neele Falk, Sara Papi and Mike Zhang, 258–265. Stroudsburg, PA: Association for Computational Linguistics.
Chung, Meng-Hsuan, and Chao-Ting Tim Chou. 2025. Climbing
towards the NLU of the universal reading of shei
‘who’. Concentric 51.2:303–348.
Huang, Cheng-Teh James. 1982. Logical Relations in Chinese
and the Theory of Grammar. Doctoral
Dissertation, Massachusetts Institute of Technology, Cambridge, MA.
Li, Aijun, 2002, Chinese
prosody and prosodic labeling of spontaneous speech. Proceedings of Speech Prosody
2002, ed. by Bernard Bel and Isabelle Marlien, 39–46. Aix-en-Provence, France: Laboratoire Parole et Langage, Université de Provence.
Li, Shengnan, Qu Weiguang, Wei Tingxin, Zhou Junsheng, Gu Yanhui, and Li Bin. 2021. A
survey of Chinese anaphora resolution. Artificial Intelligence and Security: 7th
International Conference (ICAIS 2021), ed. by Xingming Sun, Xiaorui Zhang, Zhihua Xia and Elisa Bertino, 180–192. Dublin, Ireland: Springer International Publishing.
Li, Yen-Hui Audrey. 1992. Indefinite
wh in Mandarin Chinese. Journal of East Asian
Linguistics 1.2:125–155.
Lin, Jo-Wang. 1996. Polarity
Licensing and Wh-phrase Quantification in Chinese. Doctoral
dissertation, University of Massachusetts at Amherst, Amherst, MA.
. 1998. On
existential polarity wh-phrases in Chinese. Journal of East Asian
Linguistics 7.3:219–255.
Lu, Sin-En, Bo-Han Lu, Chao-Yi Lu, and Richard Tzong-Han Tsai. 2022. Exploring
methods for building dialects-Mandarin code-mixing corpora: A case study in Taiwanese
Hokkien. Findings of the Association for Computational
Linguistics (EMNLP-2022), ed.
by Yoav Goldberg, Zornitsa Kozareva and Yue Zhang, 6287–6305. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics.
Pater, Joe. 2019. Generative
linguistics and neural networks at 60: Foundation, friction, and
fusion. Language 95.1:41–74.
Rajpurkar, Pranav, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. Squad:
100,000+ questions for machine comprehension of text. Proceedings of the 2016
Conference on Empirical Methods in Natural Language Processing (EMNLP-2016), ed.
by Jian Su, Kevin Duh and Xavier Carreras, 2383–2392. Austin, TX: Association for Computational Linguistics.
Stowell, Tim. 1981. Origins
of Phrase Structure. Doctoral dissertation, Massachusetts Institute of Technology, Cambridge, MA.
Sukthanker, Rhea, Poria Soujanya, Cambria Erik, and Thirunavukarasu Ramkumar. 2020. Anaphora
and coreference resolution: A review. Information
Fusion 59:139–162.
Takahashi, Kosuke, Takahiro Omi, Kosuke Arima, and Tatsuya Ishigaki. 2023. Training
Generative Question-answering on Synthetic Data Obtained from an Instruct-tuned
Model. Retrieved September
27, 2024, from [URL]
Tsai, Wei-Tien Dylan. 1994. On Economizing the Theory of
A-Bar Dependencies. Doctoral dissertation, Massachusetts Institute of Technology, Cambridge, MA.
Wang, Wen-jet, Chia-jung Chen, Chia-ming Lee, Chien-yu Lai, and Hsin-hung Lin. 2019a. Articut:
Chinese Word Segmentation and POS Tagging System (Version
274) [Computer program]. Retrieved October 3, 2024 from [URL]
. 2019b. Linguistics-Oriented
Keyword Interface NLU System (Version 4.0) [Computer
program]. Retrieved October 3,
2024 from [URL]
Wolfram, Stephen. 1985. Analytical
and Empirical Mathematics with Computers (Report No. 8540). Princeton, NJ: The Institute for Advanced Study.
Zhou, Lexin, Wout Schellaert, Fernando Martínez-Plumed, Yael Moros-Daval, Cèsar Ferri, and José Hernández-Orallo. 2024. Larger
and more instructable language models become less
reliable. Nature 6341:61–68.
Zhu, Peide, Zhen Wang, Claudia Hauff, Jie Yang, and Avishek Anand. 2022. Answer
quality aware aggregation for extractive QA crowdsourcing. Findings of the Association
for Computational Linguistics (EMNLP 2022), ed. by Yoav Goldberg, Zornitsa Kozareva and Yue Zhang, 6147–6159. Stroudsburg, PA: The Association for Computational Linguistics.