Cover not available

Article published In: Journal of Second Language Pronunciation
Vol. 11:3 (2025) ► pp.394422

Get fulltext from our e-platform
References (72)
References
Alphacephei. (2025). Vosk speech recognition toolkit. [URL]
Baevski, A., Zhou, H., Mohamed, A., & Auli, M. (2020). wav2vec 2.0: A framework for self-supervised learning of speech representations. 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada.
Baker, A. (2014). Exploring teachers’ knowledge of second language pronunciation techniques: Teacher cognitions, observed classroom practices, and student perceptions. TESOL Quarterly, 48(1), 136–163. Google Scholar logo with link to Google Scholar
Cámara-Arenas, E., Tejedor-García, C., Tomas-Vázquez, C. J., & Escudero-Mancebo, D. (2023). Automatic pronunciation assessment vs. automatic speech recognition: A study of conflicting conditions for L2-English. Language Learning & Technology, 27(1), 1–19. [URL]
Crowther, D., Trofimovich, P., Saito, K., & Isaacs, T. (2018). Linguistic dimensions of L2 accentedness and comprehensibility vary across speaking tasks. Studies in Second Language Acquisition, 40(2), 443–457. Google Scholar logo with link to Google Scholar
Dai, Y., & Wu, Z. (2023). Mobile-assisted pronunciation learning with feedback from peers and/or automatic speech recognition: A mixed-methods study. Computer Assisted Language Learning, 36(5–6), 861–884. Google Scholar logo with link to Google Scholar
Deadman, J. (2023). Simulating realistic multiparty speech data: For the development of distant microphone ASR systems. [Doctoral dissertation, University of Sheffield]. [URL]
Derwing, T. M., & Munro, M. J. (1997). Accent, intelligibility, and comprehensibility: Evidence from four L1s. Studies in Second Language Acquisition, 19(1), 1–16. Google Scholar logo with link to Google Scholar
(2005). Second language accent and pronunciation teaching: A research-based approach. TESOL Quarterly, 39(3), 379–397. Google Scholar logo with link to Google Scholar
Dizon, G. (2020). Evaluating intelligent personal assistants for L2 listening and speaking development. Language Learning & Technology, 24(1), 16–26. Google Scholar logo with link to Google Scholar
Eckes, T. (2015). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments (2nd ed.). Peter Lang GmbH.Google Scholar logo with link to Google Scholar
El Kheir, Y., Ali, A., & Chowdhury, S. A. (2023). Automatic pronunciation assessment: A review. Findings of the Association for Computational Linguistics: EMNLP 2023, 8304–8324. [URL].
Farrús, M. (2023). Automatic speech recognition in L2 learning: A review based on PRISMA methodology. Languages, 8(4), 242. Google Scholar logo with link to Google Scholar
Ferraro, A., Galli, A., La Gatta, V., & Postiglione, M. (2023). Benchmarking open source and paid services for speech to text: An analysis of quality and input variety. Frontiers in Big Data, 61, 1210559. Google Scholar logo with link to Google Scholar
Geng, H., Saito, D., & Minematsu, N. (2024). Simulating native speaker shadowing for nonnative speech assessment with latent speech representations. arXiv. Google Scholar logo with link to Google Scholar
Gong, Y., Chen, Z., Chu, I.-H., Chang, P., & Glass, J. (2022). Transformer-based multi-aspect multi-granularity non-native English speaker pronunciation assessment. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7262–7266. Google Scholar logo with link to Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A.-r., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97. Google Scholar logo with link to Google Scholar
Hirai, A., & Kovalyova, A. (2023). Using speech-to-text applications for assessing English language learners’ pronunciation: A comparison with human raters. In M.-d.-M. Suárez & W. M. El-Henawy (Eds.), Optimizing online English language learning and teaching (pp. 337–355). Springer International Publishing. Google Scholar logo with link to Google Scholar
Hosseini-Kivanani, N., Gretter, R., Matassoni, M., & Falavigna, G. D. (2021). Experiments of ASR-based mispronunciation detection for children and adult English learners. arXiv. Google Scholar logo with link to Google Scholar
Hsu, W.-N., Bolte, B., Tsai, Y.-H. H., Lakhotia, K., Salakhutdinov, R., & Mohamed, A. (2021). HuBERT: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 291, 3451–3460. Google Scholar logo with link to Google Scholar
Inceoglu, S., Chen, W.-H., & Lim, H. (2023). Assessment of L2 intelligibility: Comparing L1 listeners and automatic speech recognition. ReCALL, 35(1), 89–104. Google Scholar logo with link to Google Scholar
Isbell, D. R., & Lee, J. (2022). Self-assessment of comprehensibility and accentedness in second language Korean. Language Learning, 72(3), 806–852. Google Scholar logo with link to Google Scholar
Jelinek, F. (1976). Continuous speech recognition by statistical methods. Proceedings of the IEEE, 64(4), 532–556. Google Scholar logo with link to Google Scholar
Jenkins, J. (2000). The phonology of English as an international language: New models, new norms, new goals. Oxford University Press.Google Scholar logo with link to Google Scholar
Jurafsky, D., & Martin, J. H. (2024). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition with language models (3rd ed.). [URL]
Kang, O., & Rubin, D. L. (2009). Reverse linguistic stereotyping: Measuring the effect of listener expectations on speech evaluation. Journal of Language and Social Psychology, 28(4), 441–456. Google Scholar logo with link to Google Scholar
Karhila, R., Smolander, A.-R., Ylinen, S., & Kurimo, M. (2019). Transparent pronunciation scoring using articulatorily weighted phoneme edit distance. Proceedings of INTERSPEECH 2019, 1866–1870. Google Scholar logo with link to Google Scholar
Khabbazbashi, N., Xu, J., & Galaczi, E. D. (2021). Opening the black box: Exploring automated speaking evaluation. In B. Lanteigne, C. Coombe, & J. D. Brown (Eds.), Challenges in language testing around the world (pp. 333–343). Springer. Google Scholar logo with link to Google Scholar
Kheddar, H., Hemis, M., & Himeur, Y. (2024). Automatic speech recognition using advanced deep learning approaches: A survey. Information Fusion, 1091, 102422. Google Scholar logo with link to Google Scholar
Kim, M. (2023). Digital enhancement of pronunciation assessment: Automated speech recognition and human raters. Phonetics and Speech Sciences, 15(2), 13–20. Google Scholar logo with link to Google Scholar
Kim, S.-E., Chernyak, B. R., Seleznova, O., Keshet, J., Goldrick, M., & Bradlow, A. R. (2024). Automatic recognition of second language speech-in-noise. JASA Express Letters, 4(2), 025204. Google Scholar logo with link to Google Scholar
Knight, P. (2021). ‘Smart speaker, tell me about your acoustic sensor’. Physics World, 33(12), 25. Google Scholar logo with link to Google Scholar
Koizumi, R., Okabe, Y., & Kashimada, Y. (2017). A multifaceted Rasch analysis of rater reliability of the speaking section of the GTEC CBT. ARELE: Annual Review of English Language Education in Japan, 281, 241–256. Google Scholar logo with link to Google Scholar
Kumalija, E., & Nakamoto, Y. (2022). Performance evaluation of automatic speech recognition systems on integrated noise-network distorted speech. Frontiers in Signal Processing, 21, 999457. Google Scholar logo with link to Google Scholar
Kunal, G. (2023, August 24). Advancements in automatic speech recognition (ASR): Revolutionizing speech recognition technology. [URL]
Levis, J. (2005). Changing contexts and shifting paradigms in pronunciation teaching. TESOL Quarterly, 39(3), 369–377. Google Scholar logo with link to Google Scholar
(2020). Revisiting the intelligibility and nativeness principles. Journal of Second Language Pronunciation, 6(3), 310–328. Google Scholar logo with link to Google Scholar
Liakin, D., Cardoso, W., & Liakina, N. (2017). Mobilizing instruction in a second-language context: Learners’ perceptions of two speech technologies. Languages, 2(3), 11. Google Scholar logo with link to Google Scholar
Likhomanenko, T., Xu, Q., Pratap, V., Tomasello, P., Kahn, J., Avidov, G., Collobert, R. , & Synnaeve, G. (2021). Rethinking evaluation in ASR: Are our models robust enough? Proceedings of INTERSPEECH 2021, 311–315. Google Scholar logo with link to Google Scholar
Linacre, J. M. (2014). A user’s guide to FACETS (Version 3.80). [URL]
Lindemann, S. (2002). Listening with an attitude: A model of native-speaker comprehension of non-native speakers in the United States. Language in Society, 31(3), 419–441. Google Scholar logo with link to Google Scholar
Lounis, M., Dendani, B., & Bahi, H. (2024). Mispronunciation detection and diagnosis using deep neural networks: A systematic review. Multimedia Tools and Applications, 831, 62793–62827. Google Scholar logo with link to Google Scholar
Ma, M. (2023, February 14). Speech service update: Hierarchical Transformer for pronunciation assessment. [URL]
McGuire, M. (2025). Automatic speech recognition for non-native English: Accuracy and disfluency handling. arXiv. Google Scholar logo with link to Google Scholar
Meeker, M. (2017, May 31). Internet trends 2017. [URL]
Mehrish, A., Majumder, N., Bharadwaj, R., Mihalcea, R., & Poria, S. (2023). A review of deep learning techniques for speech processing. Information Fusion, 991, 101869. Google Scholar logo with link to Google Scholar
Microsoft. (2024, October 6). Use pronunciation assessment. [URL]
Munro, M. J., & Derwing, T. M. (1995a). Foreign accent, comprehensibility, and intelligibility in the speech of second language learners. Language Learning, 45(1), 73–97. Google Scholar logo with link to Google Scholar
(1995b). Processing time, accent, and comprehensibility in the perception of native and foreign-accented speech. Language and Speech, 38(3), 289–306. Google Scholar logo with link to Google Scholar
(2011). The foundations of accent and intelligibility in pronunciation research. Language Teaching, 44(3), 316–327. Google Scholar logo with link to Google Scholar
NCH Software. (2022). WavePad audio editor (Version 16.01) [Computer software]. [URL]
Neri, A., Cucchiarini, C., & Strik, H. (2008). The effectiveness of computer-based speech corrective feedback for improving segmental quality in L2 Dutch. ReCALL, 20(2), 225–243. Google Scholar logo with link to Google Scholar
O’Shaughnessy, D. (2024). Trends and developments in automatic speech recognition research. Computer Speech & Language, 831, 101538. Google Scholar logo with link to Google Scholar
Ockey, G. J., Chukharev-Hudilainen, E., & Hirch, R. R. (2023). Assessing interactional competence: ICE versus a human partner. Language Assessment Quarterly, 20(4-5), 377–398. Google Scholar logo with link to Google Scholar
Ortega, M., Mora, J. C., & Mora-Plaza, I. (2022). L2 learners’ self-assessment of comprehensibility and accentedness: Over/under-estimation, effects of rating peers, and attention to speech features. Proceedings of the 12th Pronunciation in Second Language Learning and Teaching Conference. Google Scholar logo with link to Google Scholar
Panayotov, V., Chen, G., Povey, D., & Khudanpur, S. (2015). Librispeech: An ASR corpus based on public domain audio books. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5206–5210. Google Scholar logo with link to Google Scholar
Patman, C., & Chodroff, E. (2024). Speech recognition in adverse conditions by humans and machines. JASA Express Letters, 4(11), 115204. Google Scholar logo with link to Google Scholar
Pieraccini, R. (2012). The voice in the machine: Building computers that understand speech. MIT Press. Google Scholar logo with link to Google Scholar
Povey, D. (2020). Librispeech ASR model. [URL]
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., & Schwarz, P. (2011). The Kaldi speech recognition toolkit. Proceedings of ASRU 2011, IEEE Signal Processing Society.Google Scholar logo with link to Google Scholar
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286. Google Scholar logo with link to Google Scholar
Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2023). Robust speech recognition via large-scale weak supervision. Proceedings of the 40th International Conference on Machine Learning, 28492–28518. [URL]
Saito, K., Webb, S., Trofimovich, P., & Isaacs, T. (2016). Lexical correlates of comprehensibility versus accentedness in second language speech. Bilingualism: Language and Cognition, 19(3), 597–609. Google Scholar logo with link to Google Scholar
Sun, W. (2023). The impact of automatic speech recognition technology on second language pronunciation and speaking skills of EFL learners: A mixed methods investigation. Frontiers in Psychology, 141, 1210187. Google Scholar logo with link to Google Scholar
Tergujeff, E. (2021). Second language comprehensibility and accentedness across oral proficiency levels: A comparison of two L1s. System, 1001, 102567. Google Scholar logo with link to Google Scholar
Thi-Nhu Ngo, T., Hao-Jan Chen, H., & Kuo-Wei Lai, K. (2023). The effectiveness of automatic speech recognition in ESL/EFL pronunciation: A meta-analysis. ReCALL, 36(1), 4–21. Google Scholar logo with link to Google Scholar
Thomson, R. I., & Derwing, T. M. (2015). The effectiveness of L2 pronunciation instruction: A narrative review. Applied Linguistics, 36(3), 326–344. Google Scholar logo with link to Google Scholar
Trofimovich, P., & Isaacs, T. (2012). Disentangling accent from comprehensibility. Bilingualism: Language and Cognition, 15(4), 905–916. Google Scholar logo with link to Google Scholar
Yu, D., & Deng, L. (2015). Automatic speech recognition: A deep learning approach. Springer. Google Scholar logo with link to Google Scholar
Zhang, Y., & Ai, J. (2024). Semantic-weighted word error rate based on BERT for evaluating automatic speech recognition models. 2024 11th International Conference on Dependable Systems and Their Applications (DSA), 189–198. Google Scholar logo with link to Google Scholar
Zou, B., Du, Y., Wang, Z., Chen, J., & Zhang, W. (2023). An investigation into artificial intelligence speech evaluation programs with automatic feedback for developing EFL learners’ speaking skills. Sage Open, 13(3), 21582440231193818. Google Scholar logo with link to Google Scholar
Mobile Menu Logo with link to supplementary files background Layer 1 prag Twitter_Logo_Blue