Article published In: Journal of Second Language Pronunciation
Vol. 4:2 (2018) ► pp.182–207
Directions for the future of technology in pronunciation research and teaching
Available under the Creative Commons Attribution-NonCommercial (CC BY-NC) 4.0 license.
For any use beyond this license, please contact the publisher at rights@benjamins.nl.
Published online: 1 February 2019
https://doi.org/10.1075/jslp.17001.obr
https://doi.org/10.1075/jslp.17001.obr
Abstract
This paper reports on the role of technology in state-of-the-art pronunciation research and instruction, and makes concrete
suggestions for future developments. The point of departure for this contribution is that the goal of second language (L2)
pronunciation research and teaching should be enhanced comprehensibility and intelligibility as opposed to native-likeness. Three
main areas are covered here. We begin with a presentation of advanced uses of pronunciation technology in research with a special
focus on the expertise required to carry out even small-scale investigations. Next, we discuss the nature of data in pronunciation
research, pointing to ways in which future work can build on advances in corpus research and crowdsourcing. Finally, we consider
how these insights pave the way for researchers and developers working to create research-informed, computer-assisted
pronunciation teaching resources. We conclude with predictions for future developments.
Article outline
- 1.Introduction
- 2.Current uses of technology in pronunciation research
- 2.1Freeware
- 2.2Automatic speech recognition (ASR)
- 2.3Text to speech
- 2.4The cloud
- 3.Data collection
- 3.1Spoken learner corpora
- 3.2Collecting data through crowdsourcing
- 4.Computer-assisted pronunciation teaching (CAPT)
- 4.1Carrying out research on CAPT
- 4.2Examples of effective CAPT and research
- 4.2.1Visual tools
- 4.2.2ASR tools
- 5.Encouraging collaboration
- 6.Future directions
- Acknowledgements
- Note
References
References (84)
Abel, J., Allen, B., Burton, S., Kazama, M., Noguchi, M., Tsuda, A., Yamane, N., & Gick, B. (2015). Ultrasound-enhanced multimodal approaches to pronunciation teaching and learning. Proceedings of acoustics week in Canada. Canadian Acoustics, 43(3), 124–125.
Ballier, N., & Martin, P. (2016). Speech annotation of learner corpora. In S. Granger, G. Gilquin, & F. Meunier (Eds.), Cambridge handbook of learner corpus research (pp. 107–134). Cambridge: Cambridge University Press.
Baker, A. (2014). Exploring teachers’ knowledge of second language pronunciation techniques: Teacher cognitions, observed classroom practices, and student perceptions. TESOL Quarterly, 481, 136–163.
Boersma, P. & Weenink, D. (2017). Praat: doing phonetics by computer [Computer program]. Version 6.0.22. Retrieved from <[URL]> (15 November 2016).
Bueno Alastuey, M. C. (2010). Synchronous-voice computer-mediated communication: Effects on pronunciation. CALICO Journal, 28(1), 1–20.
Catford, J. C. (1987). Phonetics and the teaching or pronunciation. In J. Morley (Ed.), Current perspectives on pronunciation: Practices anchored in theory (pp. 87–100). Alexandria, VA: TESOL.
Chun, D. M. (2013). Computer-assisted pronunciation teaching. In C. A. Chapelle (Ed.), Encyclopedia of applied linguistics (pp. 823–834). Malden, MA: Wiley-Blackwell.
Chun, D. M., Jiang, Y., Meyr, J., & Yang, R. (2015). Acquisition of L2 Mandarin Chinese tones with learner-created tone visualizations. Journal of Second Language Pronunciation, 1(1), 86–114.
Cooke, M., Barker, J., & Lecumberri, M. L. G. (2013). Crowdsourcing in speech perception. In M. Eskenazi, G. -A. Levow, H. Meng, G. Parent, & D. Suendermann (Eds.), Crowdsourcing for speech processing: Applications to data collection, transcription and assessment (pp. 137–172). Chichester: Wiley & Sons.
Cucchiarini, C., & Strik, H. (2018). Automatic speech recognition for second language pronunciation assessment and training. In O. Kang, R. I. Thomson, & M. J. Murphy (Eds.), pp. 556–569. The Routledge handbook of English pronunciation. London: Routledge.
Cucchiarini, C., Neri, A., & Strik, H. (2009). Oral proficiency training in Dutch L2: The contribution of ASR-based corrective feedback. Speech Communication, 51(10), 853–863.
Cucchiarini, C., Strik, H. & Boves, L. (2000a). Different aspects of expert pronunciation quality ratings and their relation to scores produced by speech recognition algorithm. Speech Communication, 30(2–3), 109–119.
Cucchiarini, C., Strik, H., & Boves, L. (2000b). Quantitative assessment of second language learners’ fluency. Journal of the Acoustical Society of America, 107(2), 989–999.
Cucchiarini, C., Strik, H. & Boves, L. (2002). Quantitative assessment of second language learners’ fluency: Comparisons between read and spontaneous speech. Journal of the Acoustical Society of America, 111(6), 2862–2873.
Cucchiarini, C., Driesen, J., Van Hamme, H., & Sanders, E. (2008). Recording speech of children, non-natives and elderly people for HLT applications: The JASMIN-CGN corpus. Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008 (pp. 1445–1450).
Darcy, I., Ewert, D., & Lidster, R. (2012). Bringing pronunciation instruction back into the classroom: An ESL teachers’ pronunciation “toolbox”. In. J. Levis & K. LeVelle (Eds.), Proceedings of the 3rd Pronunciation in Second Language Learning and Teaching Conference, Sept. 2011 (pp. 93–108). Ames, IA: Iowa State University.
Derwing, T. M., & Munro, M. J. (2005). Second language accent and pronunciation teaching: A research-based approach. TESOL Quarterly, 391, 379–397.
(2015). Pronunciation fundamentals: Evidence-based perspectives for L2 teaching. Amsterdam: John Benjamins.
Do, H., Hussein, H., Mixdorff, H., Jokisch, O., Ding, H., Gao, Q., Wei, S. and Hu, G. (2012). Evaluation of benefits from a computer-aided pronunciation training system for German learners of Mandarin Chinese. Proceedings of Speech Prosody 2012 (pp. 362–365). Shanghai, China.
Durand, J., Gut, U., & Kristofferson, G. (Eds.). (2014). Handbook of corpus phonology. Oxford: Oxford University Press.
Eskenazi, M., (2013). The basics. In M. Eskenazi, G. -A. Levow, H. Meng, G. Parent, & D. Suendermann (Eds.), Crowdsourcing for speech processing: Applications to data collection, transcription and assessment (pp. 11–33). Chichester: Wiley & Sons.
Field, J. (2005). Intelligibility and the listener: The role of lexical stress. TESOL Quarterly, 39(3), 399–423.
Foote, J. A., Holtby, A. K., & Derwing, T. M. (2011). Survey of the teaching of pronunciation in adult ESL programs in Canada, 2010. TESL Canada Journal, 29(1), 1–22.
Foote, J., & Smith, G. (2013, September). Is there an app for that? Paper presented at the 5th Pronunciation in Second Language Learning and Teaching Conference, Ames, IA.
Fujisaki, H. & Hirose, K. (1984). Analysis of voice fundamental frequency contours for declarative sentences of Japanese. Journal of the Acoustical Society of Japan, 5(4), 233–241.
Gilquin, G. (2015). From design to collection of learner corpora. In S. Grainger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 9–34). Cambridge: Cambridge University Press.
Granger, S., Gilquin, G., & Meunier, F. (2016). Introduction: Learner corpus research – past, present and future. In S. Granger, G. Gilquin, & F. Meunier (Eds.), Cambridge handbook of learner corpus research (pp. 1–5). Cambridge: Cambridge University Press.
Hahn, L. D. (2004). Primary stress and intelligibility: Research to motivate the teaching of suprasegmentals. TESOL Quarterly, 38(2), 201–223.
Hahn, M. K. (2002). The persistence of learned primary phrase stress patterns among learners of English (Unpublished doctoral dissertation). University of Illinois, Urbana-Champaign.
Hardison, D. M. (2004). Generalization of computer-assisted prosody training: Quantitative and qualitative findings. Language Learning & Technology, 8(1), 34–52. Retrieved from <[URL]>
(2016, August). Visualizing the gestural and prosodic components of emphasis in multimodal discourse. Paper presented at the International Roundtable on The Role of Technology in L2 Pronunciation Research and Teaching, University of Calgary, Canada.
Hilbert, A., Mixdorff, H., Ding, H., Pfizinger, H., & Jokisch, O. (2010). Prosodic analysis of accented German by Russian and Chinese learners. Proceedings of Speech Prosody 2010, Chicago, IL.
Hilbert, A., & Mixdorff, H. (2011). Weiterentwicklung eines Sprachsynthesesystems. In G. Görlitz (Ed.), Nachhaltige Forschung in Wachstumsbereichen Band I (pp. 35–42). Berlin: Logos Verlag.
Hu, W., Qian, Y., Soong, F. K., & Wang, Y. (2015). Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers. Speech Communication, 671, 154–166.
Hussein, H., Do, H. S., Mixdorff, H., Ding, H., Gao, Q., Hu, G., Wei, S., & Chao, Z. (2011). Mandarin tone perception and production by German learners. Proceedings of the Workshop on Speech and Language Technology in Education (SLaTE), Venice, Italy.
Ingram, J., Mixdorff, H., & Kwon, N., (2009). Voice morphing and the manipulation of intra-speaker and cross-speaker phonetic variation to create foreign accent continua: A perceptual study. Proceedings of the Workshop on Speech and Language Technology in Education (SLaTE), Wroxall Abbey, England.
Kipp, M. (2001).
Anvil – A generic annotation tool for multimodal dialogue. Proceedings of the 7th European Conference on Speech Communication and Technology (pp. 1367–1370). Aalborg, Denmark: Eurospeech. Available at <[URL]>
(2014). ANVIL: A universal video research tool. In J. Durand, U. Gut, & G. Kristofferson (Eds.), Handbook of corpus phonology (pp. 420–436). Oxford: Oxford University Press.
Lee, J., Jang, J., & Plonsky, L. (2015). The effectiveness of second language pronunciation instruction: A meta-analysis. Applied Linguistics, 36(3), 345–366.
Levis, J. M. (2005). Changing contexts and shifting paradigms in pronunciation teaching. TESOL Quarterly, 39(3), 369–377.
Levis, J. (2007). Computer technology in teaching and researching. Annual Review of Applied Linguistics, 271, 184–202.
Lippi-Green, R. (2012). English with an accent: Language, ideology, and discrimination in the United States (2nd ed.). London: Routledge.
Liu, X., Deng, E., Liu, S., et al. (Eds.) (1981). Shíyòng Hànyŭ Kèbĕn Dì Yī Cè 实用汉语课本第一册 [Practical Chinese Reader, Book I] (pp. i–viii). Beijing: Shangwu yinshuguan (The Commercial Press).
Lively, S. E., Logan, J. S., & Pisoni, D. B. (1993). Training Japanese listeners to identify English /r/ and /l/. II: The role of phonetic environment and talker variability in learning new perceptual categories. Journal of the Acoustical Society of America, 961, 2076–2087.
Mackey, A., & Gass, S. (2005). Second language research: Methodology and design. Mahwah, NJ: Lawrence Erlbaum Associates. .
MacWhinney, B. (2000). The CHILDES Project: Tools for analyzing talk (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates. . Retrieved from <[URL]>
Mixdorff, H., & Ingram, J. (2009). Prosodic analysis of foreign-accented English. Proceedings of Interspeech, Brighton, UK.
Mixdorff, H., Külls, D., Hussein, H., Shu, G., Guoping, H., & Si, W. (2009). Towards a computer-aided pronunciation training system for German learners of Mandarin. In Proceedings of the Workshop on Speech and Language Technology in Education (SLaTE), Wroxall Abbey, Warwickshire, UK.
Mixdorff, H., & Munro, M. J. (2013). Quantifying and evaluating the impact of prosodic differences of foreign-accented English. Proceedings of the Workshop on Speech and Language Technology in Education (SLaTE). Grenoble, France.
Motohashi-Saigo, M., & Hardison, D. M. (2009). Acquisition of L2 Japanese geminates: Training with waveform displays. Language Learning & Technology, 13(2), 29–47. Retrieved from <[URL]>
Munro, M. J., & Derwing, T. M. (2006). The functional load principle in ESL pronunciation instruction: An exploratory study. System, 341, 520–531.
Munro, M. J., Derwing, T. M., & Thomson, R. I. (2015). Setting segmental priorities for English learners: Evidence from a longitudinal study. International Review of Applied Linguistics in Language Teaching, 53(1), 39–60.
Murphy, J. (1997). Phonology courses offered by MATESOL programs in the US. TESOL Quarterly, 311, 741–764.
Neri, A., Cucchiarini, C., Strik, H., & Boves, L. (2002). The pedagogy-technology interface in computer assisted pronunciation training. Computer Assisted Language Learning, 15(5), 441–467.
Neumeyer, L., Franco, H., Digalakis, V., & Weintraub, M. (2000). Automatic scoring of pronunciation quality. Speech Communication, 30(2), 83–93.
O’Brien, M. G. (2011). Teaching and assessing pronunciation with computer technology. In N. Arnold & L. Ducate (Eds.), Present and Future Promises of CALL: From Theory and Research to New Directions in Language Teaching (2nd ed.) (pp. 375–406). San Marcos, TX: CALICO Monograph Series.
Okuno, T., & Hardison, D. M. (2016). Perception-production link in L2 Japanese vowel duration: Training with technology. Language Learning & Technology, 201, 61–80. Retrieved from <[URL]>
Olson, D. J. (2014). Benefits of visual feedback on segmental production in the L2 classroom. Language Learning & Technology, 18(3), 173–192. Retrieved from <[URL]>
Pennington, M. C. (1999). Computer-aided pronunciation pedagogy: Promise, limitations, directions. Computer Assisted Language Learning, 12(5), 427–440.
Pennington, M. C., & Ellis, N. C. (2000). Cantonese speakers’ memory for English sentences with prosodic cues. The Modern Language Journal, 84(3), 372–389.
Qian, M., Chukharev-Hudalainen, E., & Levis, J. (2018). A system for adaptive high-variability segmental-perceptual training: Implementation, effectiveness, and transfer. Language Learning and Technology, (221), 69–96.
Qian, X., Meng, H., Soong, F. (2012). The use of DBN-HMMs for mispronunciation detection and diagnosis in L2 English to support computer-aided pronunciation training. Proceedings of Interspeech 2012 (pp. 775–778), Portland, OR.
Rose, Y., & MacWhinney, B. (2014). The PhonBank project: Data and software-assisted methods for the study of phonology and phonological development. In J. Durand, U. Gut, & G. Kristoffersen (Eds.), The Oxford handbook of corpus phonology (pp. 308–401). Oxford: Oxford University Press.
Smith, B. L., & Hayes-Harb, R. (2011). Individual differences in the perception of final consonant voicing among native and non-native speakers of English. Journal of Phonetics, 391, 115–120.
Staples, S. (2015). Spoken corpora. In D. Biber & R. Reppen (Eds.), The Cambridge handbook of English corpus linguistics (pp. 271–291). Cambridge: Cambridge University Press.
Strik, H. (2012). ASR-based systems for language learning and therapy. International Symposium on Automatic Detection of Errors in Pronunciation Training (IS-Adept). KTH, Stockholm, Sweden, 6–8 June.
Strik, H., Colpaert, J., Van Doremalen, J., & Cucchiarini, C. (2012). The DISCO ASR-based CALL system: Practicing L2 oral skills and beyond. Proceedings of the Conference on International Language Resources and Evaluation (LREC 2012), Istanbul, May.
Strik, H., & Cucchiarini, C. (2014). On automatic phonological transcription of speech corpora. In J. Durand, U. Gut, & G. Kristofferson (Eds.), The Oxford handbook of corpus phonology. Oxford: Oxford University Press.
Strik, H., Truong, K., de Wet, F., & Cucchiarini, C. (2009). Comparing different approaches for automatic pronunciation error detection. Speech Communication, 51(10), 845–852.
Sweet, H. (1900). The practical study of languages: A guide for teachers and learners. New York, NY: Henry Holt & Co.
Thomson, R. I. (2011). Computer assisted pronunciation training: Targeting second language vowel perception improves pronunciation. CALICO Journal, 281, 744–765.
(2016). Does training to perceive L2 English vowels in one phonetic context transfer to other phonetic contexts? Proceedings of the annual conference of the Canadian Acoustics Association. Canadian Acoustics, 44(3), 198–199.
(2018). English Accent Coach [Computer program]. Version 2.3. Retrieved from <[URL]>
Thomson, R. I., & Derwing, T. M. (2015). The effectiveness of L2 pronunciation instruction: A narrative review. Applied Linguistics, 36(3), 326–344.
(2016). Is phonemic training using nonsense or real words more effective? In J. Levis, H. Le, I. Lucic, E. Simpson, & S. Vo (Eds.). Proceedings of the 7th Pronunciation in Second Language Learning and Teaching Conference, Oct. 2015. (pp. 88–97). Ames, IA: Iowa State University.
Trouvain, J., & Gut, U. (Eds.) (2007). Non-native prosody: Phonetic description and teaching practice. Berlin: Mouton de Gruyter.
Van Doremalen, J. (2014). Developing automatic speech recognition-enabled language learning applications: from theory to practice. Evaluating automatic speech recognition-based language learning systems: a case study (Unpublished PhD dissertation). Radboud University, Nijmegen.
Van Doremalen, J., Boves, L., Colpaert, J., Cucchiarini, C., & Strik, H. (2016). Evaluating automatic speech recognition-based language learning systems: A case study. Computer Assisted Language Learning, 29(4), 833–851.
Van Doremalen, J., Cucchiarini, C., & Strik, H. (2010). Optimizing automatic speech recognition for low-proficient non-native speakers. EURASIP Journal on Audio, Speech, and Music Processing 2009.
(2013). Automatic pronunciation error detection in non-native speech: the case of vowel errors in Dutch. Journal of the Acoustical Society of America, 1341, 1336–1347.
Weinberger, S. H. (2017). Speech Accent Archive. George Mason University. Retrieved from <[URL]>
Cited by (36)
Cited by 36 other publications
Crowther, Dustin & Shawn Loewen
Jakonen, Teppo, Derya Duran & Pauliina Peltonen
Kolesnichenko, Marina & Vitalii Kapitan
Liu, Yao, Faizahani binti Ab Rahman & Farah binti Mohamad Zain
Sweeting, Arizio M. & Michael D. Carey
Mahmood, Rizgar Qasim
Mahmood, Rizgar Qasim
Mahmood, Rizgar Qasim & Hung Phu Bui
Sun, Yan
Ali, Saandia, Marie Garnier & Linda Terrier
Hirai, Akiyo & Angelina Kovalyova
Huang, Guanyu & Roger K. Moore
Saito, Kazuya, Konstantinos Macmillan, Magdalena Kachlicka, Takuya Kunihara & Nobuaki Minematsu
Sun, Weina
Vančová, Hana
Chun, Dorothy M. & Yan Jiang
Gómez-Lacabex, Esther, Francisco Gallardo-del-Puerto & Jian Gong
2022. Perception and production training effects on production of English lexical schwa by young Spanish
learners. Journal of Second Language Pronunciation 8:2 ► pp. 196 ff.
Quesada Vázquez, Leticia
Rehman, Ivana, Alif Silpachai, John Levis, Guanlong Zhao & Ricardo Gutierrez-Osuna
Trouvain, Jürgen
Zhu, Shan & Sheng Bin
O’Brien, Mary Grantham
Papin, Kevin
Setter, Jane & Takehiko Makino
Walesiak, Beata
2021. Mobile apps for pronunciation training. In English Pronunciation Instruction [AILA Applied Linguistics Series, 19], ► pp. 357 ff.
Bahi, Halima & Khaled Necibi
Dendani, Bilal, Halima Bahi & Toufik Sari
Tejedor-Garcia, Cristian, David Escudero-Mancebo, Valentin Cardenoso-Payo & Cesar Gonzalez-Ferreras
Hardison, Debra M.
Hardison, Debra M.
Hardison, Debra M.
Henrichsen, Lynn
Henrichsen, Lynn
Tsai, Pi-hua
Yarra, Chiranjeevi, Aparna Srinivasan, Chandana Srinivasa, Ritu Aggarwal & Prasanta Kumar Ghosh
Levis, John M.
This list is based on CrossRef data as of 13 november 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
