Exploring large language models for L2 metaphonological awareness tutoring

Łodzikowski, Kacper; Weckwerth, Jarosław; Malarski, Kamil

doi:10.1075/jslp.24030.lod

Article published In: Journal of Second Language Pronunciation
Vol. 11:1 (2025) ► pp.46–75

Get fulltext from our e-platform

Download PDF

Download EPUB

Exploring large language models for L2 metaphonological awareness tutoring

Kacper Łodzikowski | Adam Mickiewicz UniversityPoznań

Jarosław Weckwerth | Adam Mickiewicz UniversityPoznań

Kamil Malarski | Adam Mickiewicz UniversityPoznań

Published online: 2 June 2025

https://doi.org/10.1075/jslp.24030.lod

Abstract

This is the first observational study to evaluate the feasibility of implementing large language models (LLMs) for second language (L2) metaphonological awareness training. A custom implementation of GPT-4 acting as a homework tutor was piloted in an English phonetics and phonology course for first-year university students. Two novel homework assignments were designed to leverage the LLM’s strengths and explore its weaknesses. Analysis of learner interaction logs, homework reflections, and survey data revealed that most learners perceived the AI Tutor as helpful for its personalised explanations. However, the overall sentiment was mixed due to the LLM’s propensity for confabulation. Despite these challenges, the pilot demonstrated the potential for LLMs to engage learners in active and self-regulated learning. Recommendations for future directions include designing LLM-based learning environments, promoting AI literacy among educators and learners, and experimentally researching long-term effects of AI tutors on learning outcomes.

Keywords: CAPT, artificial intelligence, AI, L2 pronunciation.

Article outline

1.Introduction
- 1.1Prior use of AI in second language pronunciation acquisition
- 1.2Potential of large language models as tutors
- 1.3Research questions
2.Study design
- 2.1Course profile
- 2.2Participants
- 2.3AI tutor design
- 2.4Pilot deployment
- 2.5Activity design
- 2.6Data collection
- 2.7Data analysis
3.Results
- 3.1General usage patterns
- 3.2Homework activity 1
- 3.3Homework activity 2
- 3.4Post-pilot survey
4.Discussion
- 4.1RQ1: Implementation Feasibility
- 4.2RQ2: Accuracy and helpfulness
- 4.3RQ3: Active and self-regulated learning
- 4.4Limitations
- 4.5Future directions
  - 4.5.1Custom implementations
  - 4.5.2Raising teacher and learner awareness
  - 4.5.3Educational research
5.Conclusion
Acknowledgements
References

References (73)

References

Anderson, L. W., Krathwohl, D. R., Airasian, P. W., Cruikshank, K. A., Mayer, R. E., Pintrich, P. R., Raths, J., & Wittrock, M. C. (2001). A taxonomy for learning, teaching and assessing: A revision of Bloom’s taxonomy of educational objectives. New York: Addison Wesley Longman.

Anthropic. (2023, June 14). Claude 2.1: A milestone in helpful and honest AI. Claude by Anthropic. [URL]

Azevedo, R., Taub, M., & Mudrick, N. V. (2018). Understanding and reasoning about real-time cognitive, affective, and metacognitive processes to foster self-regulation with advanced learning technologies. In D. H. Schunk & J. A. Greene (Eds.), Handbook of self-regulation of learning and performance (2nd ed., pp. 254–270). Routledge/Taylor & Francis Group.

Becker, K., & Edalatishams, I. (2019). ELSA Speak — Accent reduction. In J. Levis, C. Nagle, & E. Todey (Eds.), Proceedings of the 10th pronunciation in second language learning and teaching conference (pp. 434–438). Iowa State University.

Beguš, G., Dąbkowski, M., & Rhodes, R. (2023). Large linguistic models: Analyzing theoretical linguistic abilities of LLMs. arXiv.

Bernstein, J. (1999). PhonePass testing: Structure and construct. Menlo Park, CA: Ordinate Corporation.

Bjork, R. A., Dunlosky, J., & Kornell, N. (2013). Self-regulated learning: Beliefs, techniques, and illusions. Annual Review of Psychology, 641, 417–444.

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language models are few-shot learners. arXiv.

Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y. T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M. T., & Zhang, Y. (2023). Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv.

Carlet, A., & Kivistö-de Souza, H. (2018). Improving L2 pronunciation inside and outside the classroom: Perception, production and autonomous learning of L2 vowels. Ilha do Desterro, 71(3), 99–123.

Chao, P.-J., Hsu, T.-H., Liu, T.-P., & Cheng, Y.-H. (2021). Knowledge of and competence in artificial intelligence: Perspectives of Vietnamese digital-native students. IEEE Access, 91, 75751–75760.

Chi, M. T. H., & Wylie, R. (2014). The ICAP framework: Linking cognitive engagement to active learning outcomes. Educational Psychologist, 49(4), 219–243.

Chun, D. M. (2023). ELSA: English Language Speech Assistant. Journal of Second Language Pronunciation.

Collins, B., Mees, I. M., & Carley, P. (2019). Practical English phonetics and phonology: A resource book for students (4th ed.). Routledge.

Cope, B., & Kalantzis, M. (2023). A multimodal grammar of artificial intelligence: Measuring the gains and losses in generative AI. Multimodality & Society, 4(2).

Coulange, S. (2023). Computer-aided pronunciation training in 2022: When pedagogy struggles to catch up. In A. Henderson & A. Kirkova-Naskova (Eds.), Proceedings of the 7th International Conference on English Pronunciation: Issues and Practices (pp. 11–22).

Couper, G. (2022). Teaching and testing perception of word stress: Many shades of perception. In J. Levis & A. Guskaroska (Eds.), Proceedings of the 12th Pronunciation in Second Language Learning and Teaching Conference.

Cruttenden, A. (2014). Gimson’s pronunciation of English (8th ed.). Routledge.

D’Mello, S. K., & Graesser, A. C. (2023). Intelligent tutoring systems: ‘. In P. A. Schutz & K. R. Muis (Eds.), Handbook of educational psychology (4th ed.). Routledge.

Duijn, M. van, Dijk, B. van, Kouwenhoven, T., Valk, W. de, Spruit, M., & van der Putten, P. (2023). Theory of Mind in Large Language Models: Examining Performance of 11 State-of-the-Art models vs. Children Aged 7–10 on Advanced Tests. Proceedings of the 27th Conference on Computational Natural Language Learning, 389–402.

Ellis, R. (2004). The definition and measurement of L2 explicit knowledge. Language Learning, 54(2), 227–275.

(2005). Principles of instructed language learning. System, 33(2), 209–224.

Fu, K., Peng, L., Yang, N., & Zhou, S. (2024). Pronunciation assessment with multi-modal large language models. arXiv. [URL]

Fu, Y., Song, H., Lee, S., Ma, D., Chi, E. H., Yang, L., & Zhang, C. (2023). Why can GPT-4 fail simple reasoning problems? A comprehensive evaluation of reasoning failures. arXiv.

Garrett, N. (2009). Technology in the service of language learning: Trends and issues. The Modern Language Journal, 931: 697–718.

Gartner. (2023). Hype cycle for artificial intelligence, 2023. Retrieved November 25, 2023, from [URL]

Gonet, W., & Stadnicka, L. (2005). Vowel clipping in English. Speech and Language Technology 81, 77–86.

Graesser, A. C., Person, N. K., & Magliano, J. P. (1995). Collaborative dialogue patterns in naturalistic one-to-one tutoring. Applied Cognitive Psychology, 9(6), 495–522.

Graesser, A. C., Conley, M. W., & Olney, A. (2012). Intelligent tutoring systems. In K. R. Harris, S. Graham, T. Urdan, A. G. Bus, S. Major, & H. L. Swanson (Eds.), APA educational psychology handbook, Vol. 3. Application to learning and teaching (pp. 451–473). American Psychological Association.

Harnad, S. (2024). Language writ large: LLMs, ChatGPT, grounding, meaning and understanding. arXiv.

Hunter, J. D. (2007). Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering, 9(3), 90–95.

Jekiel, M. (2022). L2 rhythm production and musical rhythm perception in advanced learners of English. Poznań Studies in Contemporary Linguistics, 58(2), 315–340.

Jekiel, M., & Malarski, K. (2021). Musical hearing and musical experience in second language English vowel acquisition. Journal of Speech, Language, and Hearing Research, 64(5), 1666–1682.

Jeon, J., & Lee, S. (2023). Large language models in education: A focus on the complementary relationship between human teachers and ChatGPT. Education and Information Technologies, 28(12), 15873–15892.

Jeon, J., Lee, S., & Choe, H. (2023). Beyond ChatGPT: A conceptual framework and systematic review of speech-recognition chatbots for language learning. Computers & Education, 2061, 104898.

Kheiri, K., & Karimi, H. (2023). SentimentGPT: Exploiting GPT for advanced sentiment analysis and its departure from current machine learning. arXiv.

Kosinski, M. (2023). Theory of Mind Might Have Spontaneously Emerged in Large Language Models. arXiv.

Kulik, J. A., & Fletcher, J. D. (2016). Effectiveness of ITSs: A meta-analytic review. Review of Educational Research, 86(1), 42–78.

Lombardi, D., Shipley, T. F., Bailey, J. M., Bretones, P. S., Prather, E. E., Ballen, C. J., Knight, J. K., Smith, M. K., Stowe, R. L., Cooper, M. M., Prince, M., Atit, K., Uttal, D. H., LaDue, N. D., McNeal, P. M., Ryker, K., St. John, K., van der Hoeven Kraft, K. J., & Docktor, J. L. (2021). The curious construct of active learning. Psychological Science in the Public Interest, 22(1), 8–43.

Łodzikowski, K. (2021). Association between allophonic transcription tool use and phonological awareness level. Language Learning and Technology, 25(1), 20–30. [URL]

Łodzikowski, K., Foltz, P. W., & Behrens, J. T. (2023). Generative AI and its educational implications. arXiv.

Liu, R., Zenke, C., Liu, C., Holmes, A., Thornton, P., & Malan, D. J. (2024). Teaching CS50 with AI: Leveraging generative artificial intelligence in computer science education. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education — SIGCSE 2024 (Vol. 11, pp. 1–7). Portland: ACM.

McKinney, W. (2010). Data structures for statistical computing in Python. In S. van der Walt & J. Millman (Eds.), Proceedings of the 9th Python in Science Conference (pp. 56–61).

Merriënboer, J. G., & Sweller, J. (2005). Cognitive load theory and complex learning: Recent developments and future directions. Educational Psychology Review, 17(2), 147–177.

Mollick, E. R., & Mollick, L. (2024). Instructors as innovators: A future-focused approach to new AI learning opportunities, with prompts. The Wharton School Research Paper.

Mompeán, J. A. (2024). ChatGPT for L2 pronunciation teaching and learning. ELT Journal, 78(4), 423–434.

Nowacka, M. (2022). English phonetics course: University students’ preferences and expectations. Research in Language, 20(1), 71–84.

NYU Center for Mind, Brain and Consciousness. (2023, April 6). Debate: Do language models need sensory grounding for meaning and understanding? [Video]. YouTube. [URL]

Organisation for Economic Co-operation and Development. (2024). Explanatory memorandum on the updated OECD definition of an AI system (OECD Artificial Intelligence Papers, No. 8). OECD Publishing.

OpenAI. (2022, November 30). Introducing ChatGPT. [URL]

. (2023a). GPT-3.5 Turbo [Large language model]. OpenAI Blog. [URL]

. (2023b). GPT-4 technical report. arXiv.

. (2024). GPT-4o [Large language model]. [URL]

Pea, R. D. (2004). The social and technological dimensions of scaffolding and related theoretical concepts for learning, education, and human activity. Journal of the Learning Sciences, 13(3), 423–451.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 121, 2825–2830.

Pennington, M. C., & Rogerson-Revell, P. (2019). Using technology for pronunciation teaching, learning, and assessment. In M. C. Pennington & P. Rogerson-Revell (Eds.), English pronunciation teaching and research (pp. 235–286). Palgrave Macmillan.

Python Software Foundation. (2023). Python (Version 3.12.1). [URL]

Reid, L., Button, D., & Brommeyer, M. (2023). Challenging the myth of the digital native: A narrative review. Nursing Reports, 13(2), 573–600.

Rojczyk, A., & Porzuczek, A. (2012). Selected aspects in the acquisition of English phonology by Polish learners — Segments and prosody. In D. Gabryś-Barker (Ed.), Readings in second language acquisition (pp. 93–120). Katowice: Uniwersytet Śląski.

Schmidt, R. W. (1990). The role of consciousness in second language learning. Applied Linguistics, 11(2), 129–158.

Shaikh, S., Yayilgan, S. Y., Klimova, B., & Pikhart, M. (2023). Assessing the usability of ChatGPT for formal English language learning. European Journal of Investigation in Health, Psychology and Education, 13(9), 1937–1960.

Sharma, P., & Hannafin, M. J. (2007). Scaffolding in technology-enhanced learning environments. Interactive Learning Environments, 15(1), 27–46.

Sim, A., Wang, Y., Chan, T. S., & Huang, Y. (2024). Evaluating the generation of spatial relations in text and image generative models. arXiv Preprint, arXiv:2411.07664.

Simmering, P. F., & Huoviala, P. (2023). Large language models for aspect-based sentiment analysis. arXiv.

Sobkowiak, W. (2004). English phonetics for Poles (3rd ed.). Poznań: Wydawnictwo Poznańskie.

Stockwell, G., & Wang, Y. (2023). Exploring the challenges of technology in language teaching in the aftermath of the pandemic. RELC Journal, 54(2).

Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., & Lample, G. (2023). LLaMA: Open and efficient foundation language models. arXiv.

Tunstall, L., Beeching, E., Lambert, N., Rajani, N., Rasul, K., Belkada, Y., Huang, S., Werra, L. von, Fourrier, C., Habib, N., Sarrazin, N., Sanseviero, O., Rush, A. M., & Wolf, T. (2023). Zephyr: Direct distillation of LM alignment. arXiv.

Weckwerth, J. (2011). English TRAP vowel in advanced Polish learners: Variation and system typology. In W.-S. Lee & E. Zee (Eds.), Proceedings of the 17th International Congress of Phonetic Sciences (pp. 2110–2113). Hong Kong: City University of Hong Kong.

Winne, P. H. (2021). Open learner models working in symbiosis with self-regulating learners: A research agenda. International Journal of Artificial Intelligence in Education, 311, 446–459.

Wrembel, M. (2011). Cross-modal reinforcements in phonetics teaching and learning: An overview of innovative trends in pronunciation pedagogy. In W. S. Lee & E. Zee (Eds.), Proceedings of the 17th International Congress of Phonetic Sciences (pp. 104–107). City University of Hong Kong.

Yan, D., Rupp, A. A., & Foltz, P. W. (Eds.). (2020). Handbook of automated scoring: Theory into practice. CRC Press.

Zheng, Y., & De Jong, J. H. A. L. (2011). Research note: Establishing construct and concurrent validity of Pearson Test of English Academic. Pearson. Retrieved from [URL]