Automatic discourse segmentation of L1 and L2 spoken English transcripts

Natural language processing (NLP) tools, primarily trained on L1 written English, have achieved remarkable performance, but are rarely used in L2 learner data. This study leverages a rule-based segmenter to automatically segment spoken English discourse by both L1 speakers and learners, presenting novel preparatory data-cleaning steps that combine a state-of-the-art disfluency detector and additional rules to improve segmentation performance. In three successive segmentation tests on data from the Louvain Corpus of Native English Conversation (LOCNEC; ) and the Louvain International Database of Spoken English Interlanguage (LINDSEI; ), we achieve an enhanced segmentation performance that is similar for both the L1 and L2 data (.84). Our approach highlights the effectiveness of leveraging existing NLP tools to process disfluent L2 spoken transcripts, facilitating automatic discourse analysis in Learner Corpus Research (LCR). The code for executing our pipeline is publicly available for future research.

Keywords: L2 spoken discourse, automatic discourse segmentation, automatic disfluency removal, stance detection

Article outline

1.Introduction
- 1.1Segmentation principles
- 1.2Existing segmenters
2.Implementation
- 2.1Corpora
- 2.2Segmentation Test 1
  - 2.2.1Procedure
  - 2.2.2Results and discussion
- 2.3Segmentation Test 2 with disfluency removal
  - 2.3.1Methodological overview
  - 2.3.2Procedure
  - 2.3.3Results and discussion
- 2.4Segmentation Test 3 with hand-crafted rules
3.Conclusion
Open code badge
Notes
References

References (66)

References

Bach, N., & Huang, F. (2019). Noisy BiLSTM-based models for disfluency detection. Proceedings of Interspeech 2019, 4230–4234.

Bhat, S., & Yoon, S. Y. (2015). Automatic assessment of syntactic complexity for spontaneous speech scoring. Speech Communication, 671, 42–57.

Biber, D., Gray, B., & Staples, S. (2016). Predicting patterns of grammatical complexity across language exam task types and proficiency levels. Applied Linguistics, 37(5), 639–668.

Caines, A., & Buttery, P. (2014). The effect of disfluencies and learner errors on the parsing of spoken learner language. In Y. Goldberg, Y. Marton, I. Rehbein, Y. Versley, Ö. Çetinoğlu, & J. Tetreault (Eds.), Proceedings of the first joint workshop on statistical parsing of morphologically rich languages and syntactic analysis of non-canonical languages (pp. 74–81). Dublin City University. Retrieved from [URL]

Carlson, L., Okurowski, M. E., & Marcu, D. (2002). RST discourse treebank. Linguistic Data Consortium.

Chambers, L., & Ingham, K. (2011). The BULATS online speaking test. Research Notes, 431, 21–25. Retrieved from [URL]

Charniak, E., & Johnson, M. (2001). Edit detection and parsing for transcribed speech. Second Meeting of the North American Chapter of the Association for Computational Linguistics. NAACL 2001. Retrieved from [URL]

Chen, M., & Zechner, K. (2011). Computing and evaluating syntactic complexity features for automated scoring of spontaneous non-native speech. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (pp. 722–731). Association for Computational Linguistics.

Cieri, C., Graff, D., Kimball, O., Miller, D., & Walker, K. (2004). Fisher English training speech part 1 transcripts LDC2004T19. Linguistic Data Consortium.

(2005). Fisher English training speech part 2 transcripts LDC2005T19. Linguistic Data Consortium.

Cresti, E. (1995). Speech act units and informational units. In E. Fava (Ed.), Speech acts and linguistic research. (pp. 89–107). Proceedings of the Workshop, Center for Cognitive Science of New York at Buffalo

De Cock, S. (2004). Preferred sequences of words in NS and NNS speech. Belgian Journal of English Language and Literatures (BELL), New Series, 21, 225–246.

Dong, Q., Wang, F., Yang, Z., Chen, W., Xu, S., & Xu, B. (2019). Adapting translation models for transcript disfluency detection. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 6351–6358.

Feng, V. W., & Hirst, G. (2014). Two-pass discourse segmentation with pairing and global features. CoRR, abs/1407.8215. Retrieved from [URL]

Foster, P., Tonkyn, A., & Wigglesworth, G. (2000). Measuring spoken language: A unit for all reasons. Applied linguistics, 21(3), 354–375.

Gilquin, G., De Cock, S., & Granger, S. (2010). The Louvain International Database of Spoken English Interlanguage: Handbook and CD-ROM. Presses universitaires de Louvain.

Godfrey, J. J., Holliman, E. C., & McDaniel, J. (1992). SWITCHBOARD: Telephone speech corpus for research and development. Proceedings of the 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-92) (Vol. 11, pp. 517–520). IEEE.

Guhr, O., Schumann, A.-K., Bahrmann, F., & Böhme, H. J. (2021). FullStop: Multilingual Deep Models for Punctuation Prediction. Proceedings of the Swiss Text Analytics Conference 2021. CEUR Workshop Proceedings. Retrieved [URL]

Himmelmann, N. P. (2006). The challenges of segmenting spoken language. In J. Gippert, N. P. Himmelmann, & U. Mosel (Eds.), Essentials of language documentation (pp. 253–274). Mouton De Gruyter.

Hirschberg, J., & Litman, D. (1993). Empirical studies on the disambiguation of cue phrases. Computational Linguistics, 19(3), 501–530.

Hoek, J., Evers-Vermeul, J., & Sanders, T. J. M. (2018). Segmenting discourse: Incorporating interpretation into segmentation? Corpus Linguistics and Linguistic Theory, 14(2), 357–386.

Honnibal, M., & Johnson, M. (2014). Joint incremental disfluency detection and dependency parsing. Transactions of the Association for Computational Linguistics, 21, 131–142.

Hough, J., & Schlangen, D. (2015). Recurrent neural networks for incremental disfluency detection. Proceedings of Interspeech 2015, 849–853.

Izumi, E., Uchimoto, K., & Isahara, H. (2004). The NICT JLE Corpus Exploiting the language learners’ speech database for research and education. The International Journal of the Computer, the Internet and Management, 121, 119–125.

Johnson, M., & Charniak, E. (2004). A TAG-based noisy channel model of speech repairs. Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (pp. 33–39). Association for Computational Linguistics.

Joty, S., Carenini, G., & Ng, R. T. (2015). Codra: A novel discriminative framework for rhetorical analysis. Computational Linguistics, 41(3), 385–435.

Kahane, S., Caron, B., Strickland, E., & Gerdes, K. (2021). Annotation guidelines of UD and SUD treebanks for spoken corpora: a proposal. In D. Dakota, K. Evang, & S. Kübler (Eds.), Proceedings of the 20th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2021) (pp. 35–47). Association for Computational Linguistics.

Knill, K. M., Gales, M. J., Manakul, P. P., & Caines, A. P. (2019). Automatic grammatical error detection of non-native spoken learner English. ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8127–8131). IEEE.

Kyle, K., Eguchi, M., Miller, A., & Sither, T. (2022). A dependency treebank of spoken second language English. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022) (pp. 39–45). Association for Computational Linguistics.

Kyle, K., & Eguchi, M. (2024). Evaluating NLP models with written and spoken L2 samples. Research Methods in Applied Linguistics, 3(2), 100120.

Le Thanh, H., Abeysinghe, G., & Huyck, C. (2004). Automated discourse segmentation by syntactic information and cue phrases. Proceedings of the IASTED International Conference on Artificial Intelligence and Applications (AIA 2004), Innsbruck, Austria, (pp. 411–415). IASTED.

Lou, P. J., & Johnson, M. (2017). Disfluency detection using a noisy channel model and a deep neural language model. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Volume 2:(Short Papers), (pp. 547–553). Association for Computational Linguistics.

(2020). Improving disfluency detection by self-training a self-attentive model. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 3754–3763). Association for Computational Linguistics.

Lu, Y., Gales, M. J. F., Knill, K. M., Manakul, P., & Wang, Y. (2019). Disfluency detection for spoken learner English. Proceedings of the 8th ISCA Workshop on Speech and Language Technology in Education (SLaTE 2019), (pp. 74–78).

Lu, Y., Gales, M. J. F., & Wang, Y. (2020). Spoken language ‘grammatical error correction.’ Proceedings of Interspeech 2020, (pp. 3840–3844).

Mann, W., & Thompson, S. (1988). Rhetorical Structure Theory: Toward a functional theory of text organization. Text — Interdisciplinary Journal for the Study of Discourse, 8(3), 243–281.

Meurers, D. (2015). Learner corpora and natural language processing. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 537–566). Cambridge University Press.

Moore, R., Caines, A., Graham, C., & Buttery, P. (2015). Incremental dependency parsing and disfluency detection in spoken learner English. In P. Král & V. Matoušek (Eds.), Text, Speech, and Dialogue: TSD 2015 (Vol. 93021, pp. 470–479). Springer.

Oberländer, L., & Klinger, R. (2020). Token sequence labelling vs. clause classification for English emotion stimulus detection. Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics (pp. 58–70). Association for Computational Linguistics.

Ostendorf, M., & Hahn, S. (2013). A sequential repetition model for improved disfluency detection. Proceedings of Interspeech 2013, 2624–2628.

Passonneau, R. J., & Litman, D. (1997). Discourse segmentation by human and automated means. Computational Linguistics, 23(1), 103–139.

Pietrandrea, P., Kahane, S., Lacheret, A., & Sabio, F. (2014). The notion of sentence and other discourse units in corpus annotation. In T. Raso & H. Mello (Eds.), Spoken corpora and linguistic studies (pp. 331–364). John Benjamins.

Polanyi, L. (1988). A formal model of the structure of discourse. Journal of Pragmatics, 12(5–6),601–638.

Qian, X., & Liu, Y. (2013). Disfluency detection using multi-step stacked learning. Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 820–825). NAACL.

Rocholl, J., Zayats, V., Walker, D., Murad, N., Schneider, A., & Liebling, D. (2021). Disfluency detection with unlabeled data and small BERT models. Proceedings of Interspeech 2021, 766–770.

Römer, U., Roberson, A., O’Donnell, M. B., & Ellis, N. C. (2014). Linking learner corpus and experimental data in studying second language learners’ knowledge of verb-argument constructions. ICAME Journal, 38(1), 115–135.

Sacks, H., & Schegloff, E. A., & Jefferson, G. (1974). A simplest systematics for the organization of turn-taking for conversation. Language, 50(4), 696–735.

Sanders, T., & Wijk, C. (1996). PISA — A procedure for analyzing the structure of explanatory texts. Text & Talk, 16(1), 91–132.

Schilperoord, J., & Verhagen, A. (1998). Conceptual dependency and the clausal structure of discourse. In J. Koenig (Ed.), Discourse and cognition: bridging the gap (pp. 141–163). CSLI Publications.

Shriberg, E. E. (1994). Preliminaries to a theory of speech disfluencies [Unpublished Doctoral dissertation). University of California at Berkley

Skidmore, L. (2022). Incremental disfluency detection for spoken learner English (Doctoral dissertation). University of Sheffield.

Skidmore, L., & Moore, R. (2022). Incremental disfluency detection for spoken learner English. In Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022) (pp. 272–278). Association for Computational Linguistics.

Soricut, R., & Marcu, D. (2003). Sentence level discourse parsing using syntactic and lexical information. Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (pp. 228–235). NAACL 2003.

Stede, M. (2012). Small discourse units and coherence relations. In Hirst, G. (Ed.), Discourse processing (pp. 79–127). Springer International Publishing.

(2020). Automatic argumentation mining and the role of stance and sentiment. Journal of Argumentation in Context, 9(1), 19–41.

Subba, R., & Di Eugenio, B. (2007). Automatic discourse segmentation using neural networks. Proceedings of the 11th Workshop on the Semantics and Pragmatics of Dialogue (pp. 189–190). SEMDIAL.

Tofiloski, M., Brooke, J., & Taboada, M. (2009). A syntactic and lexical-based discourse segmenter. In K.-Y. Su, J. Su, J. Wiebe, & H. Li (Eds.), Proceedings of the ACL-IJCNLP 2009 Conference Short Papers (pp. 77–80). Association for Computational Linguistics.

Van Enschot, R., Spooren, W., van den Bosch, A., Burgers, C., Degand, L., Evers-Vermeul, J., … & Maes, A. (2024). Taming our wild data: On intercoder reliability in discourse research. Dutch Journal of Applied Linguistics, 131, 1–24.

Van Hest, E., Poulisse, N., & Bongaerts, T. (1997). Self-repair in L1 and L2 production: an overview. International Journal of Applied Linguistics, 117(1), 85–115.

Wang, Y., Li, S., & Yang, J. (2018). Toward fast and accurate neural discourse segmentation. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 962–967). Association for Computational Linguistics.

Wierszycka, J. (2013). Phrasal verbs in learner English: a semantic approach. A study based on a POS-tagged spoken corpus of learner English. Research in Corpus Linguistics, 11, 81–93.

Wu, S., Zhang, D., Zhou, M., & Zhao, T. (2015). Efficient disfluency detection with transition-based parsing. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Volume 1, (Long Papers), (pp. 495–503). Association for Computational Linguistics.

Yu, J., Zhang, L., Wu, S., & Zhang, B. (2017). Rhythm and disfluency: Interactions in Chinese L2 English speech. 2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA), 1–6.

Zayats, V., Ostendorf, M., & Hajishirzi, H. (2016). Disfluency detection using a bidirectional LSTM. Proceedings of Interspeech 20161, 2523–2527.

Zirn, C., Niepert, M., Stuckenschmidt, H., & Strube, M. (2011). Fine-grained sentiment analysis with structural features. Proceedings of 5th International Joint Conference on Natural Language Processing, (336–344). Asian Federation of Natural Language Processing.

Zwarts, S., & Johnson, M. (2011). The impact of language models and loss functions on repair disfluency detection. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (703–711). Association for Computational Linguistics.