Article published In: Interpreting
Vol. 26:1 (2024) ► pp.24–54
From manual to machine
Evaluating automated ear–voice span measurement in simultaneous interpreting
Published online: 15 January 2024
https://doi.org/10.1075/intp.00100.guo
https://doi.org/10.1075/intp.00100.guo
Abstract
This study introduces a groundbreaking automated methodology for measuring ear–voice span (EVS) in simultaneous
interpreting (SI). Traditionally, assessing EVS – a critical temporal metric in SI – has been hampered by labour-intensive and
time-consuming manual methods that are prone to inconsistency. To overcome these challenges, our research harnesses
state-of-the-art natural language processing (NLP) technologies, including automatic speech recognition (ASR), sentence boundary
detection (SBD) and cross-lingual alignment, to automate EVS measurement. We deployed a comprehensive array of NLP models and
evaluated the automated pipelines on a 20-hour English-to-Portuguese SI corpus which featured 57 varied audio pairings. The
findings are encouraging: the most effective model combination achieved a median EVS error of less than 0.1 seconds across the
corpus. Moreover, the automated pipelines exhibited a high level of accuracy, strong correlation and substantial agreement with
manual measurements when assessing median EVS for individual audio pairs. Despite these satisfactory results, certain challenges
persist with some NLP models, indicating clear avenues for future research. This study not only introduces a groundbreaking
approach to large-scale EVS measurement but also propels the automation of process analysis in Interpreting Studies.
Article outline
- Introduction
- 1.Ear–voice span measurement in simultaneous interpreting
- 1.1Methods and tools for ear–voice span measurement
- 1.2Statistical techniques in ear–voice span measurement
- 1.3Innovations in ear–voice span measurement
- 2.Natural language processing technologies for ear–voice span measurement
- 2.1Automatic speech recognition models
- 2.2Sentence boundary detection models
- 2.3Cross-lingual alignment models
- 3.Data collection and preparation
- 3.1Compilation of the simultaneous interpreting corpus focused on ear–voice span
- 3.2Stratified corpus sampling for manual validation
- 4.Methodology
- 4.1Automated pipeline for ear–voice span measurement
- 4.2Manual annotation of ear–voice span
- 4.3Manual validation of pipeline components of natural language processing
- 4.4Data-preprocessing and -analysis techniques
- 5.Results
- 5.1Comparative analysis of manual and automated ear–voice span measurement approaches
- 5.2Evaluation of automatic speech recognition, sentence boundary detection and cross-lingual alignment
- 6.Discussion
- 6.1Performance of automated pipelines for ear–voice span measurement
- 6.2Performance of pipeline components
- 6.3Implications and limitations
- 7.Conclusion
- Acknowledgements
- Notes
References
References (50)
Artetxe, M. & Schwenk, H. (2019). Massively
multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. Transactions
of the Association for Computational
Linguistics 71, 597–610.
Baevski, A., Zhou, H., Mohamed, A. & Auli, M. (2020). Wav2vec
2.0: A framework for self-supervised learning of speech
representations. arXiv.
Bain, M., Huh, J., Han, T. & Zisserman, A. (2023, March 1). WhisperX:
Time-accurate speech transcription of long-form audio. arXiv.
Barik, H. C. (1973). Simultaneous
interpretation: Temporal and quantitative data. Language and
Speech 16 (3), 237–270.
Bendazzoli, C. & Sandrelli, A. (2005). An
approach to corpus-based interpreting studies: Developing EPIC (European Parliament Interpreting
Corpus). Proceedings of the EU-HighLevel Scientific Conference Series MuTra 2005 – Challenges
of Multidimensional Translation. [URL]
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language
models are few-shot learners. arXiv.
Cer, D., Yang, Y., Kong, S., Hua, N., Limtiaco, N., John, R. S., … Kurzweil, R. (2018). Universal
Sentence Encoder. arXiv.
Chmiel, A., Janikowski, P. & Cieślewicz, A. (2020). The
eye or the ear? Source language interference in sight translation and simultaneous
interpreting: Interpreting 22 (2), 187–210.
Chmiel, A., Janikowski, P., Koržinek, D., Lijewska, A., Kajzer-Wietrzny, M., Jakubowski, D. & Plevoets, K. (2023). Lexical
frequency modulates current cognitive load, but triggers no spillover effect in
interpreting. Perspectives.
Chmiel, A., Koržinek, D., Kajzer-Wietrzny, M., Janikowski, P., Jakubowski, D. & Polakowska, D. (2022). Fluency
parameters in the Polish Interpreting Corpus (PINC). In M. Kajzer-Wietrzny, A. Ferraresi, I. Ivaska & Bernardini (Eds.), Mediated
discourse at the European Parliament empirical
investigations. Berlin: Language Science Press, 63–91.
Chmiel, A., Szarkowska, A., Koržinek, D., Lijewska, A., Dutka, Ł., Brocki, Ł. & Marasek, K. (2017). Ear–voice
span and pauses in intra- and interlingual respeaking: An exploratory study into temporal aspects of the respeaking
process. Applied
Psycholinguistics 38 (5), 1201–1227.
Christoffels, I. K., & de Groot, A. M. B. (2004). Components
of simultaneous interpreting: Comparing interpreting with shadowing and
paraphrasing. Bilingualism: Language and
Cognition 7 (3), 227–240.
Cokely, D. (1986). The
effects of lag time on interpreter errors. Sign Language
Studies 531, 341–375.
Collard, C. & Defrancq, B. (2019). Predictors
of ear-voice span, a corpus-based study with special reference to
sex. Perspectives 27 (3), 431–454.
Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., … Stoyanov, V. (2020). Unsupervised
cross-lingual representation learning at scale. Proceedings of the 58th Annual Meeting of the
Association for Computational Linguistics. Online: Association for Computational Linguistics, 8440–8451.
Conneau, A., Lample, G., Ranzato, M., Denoyer, L. & Jégou, H. (2018). Word
translation without parallel data. arXiv.
Davis, K. H., Biddulph, R. & Balashek, S. (1952). Automatic
recognition of spoken digits. The Journal of the Acoustical Society of
America 24 (6), 637–642.
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. (2019). BERT:
Pre-training of deep bidirectional transformers for language understanding. Proceedings of the
2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language
Technologies. Minneapolis, Minnesota: Association for Computational Linguistics, 4171–4186.
Díaz-Galaz, S., Padilla, P. & Bajo, M. T. (2015). The
role of advance preparation in simultaneous interpreting: A comparison of professional interpreters and interpreting
students. Interpreting 17 (1), 1–25.
Gerver, D. (1976). Empirical
studies of simultaneous interpretation: A review and a
model. In R. Brislin (Ed.), Translation:
Applications and research. New York: Gardner Press, 165–207.
Gile, D. (2009). Basic
concepts and models for interpreter and translator training (Rev.
ed.). Amsterdam: John Benjamins.
Gonga, A. A. N. G., Crasborn, O. A., Börstell, C. A. & Ormel, E. A. (2020). Comparing
IS and NGT interpreting processing time. A case study. In C. McDermid, S. Ehrlich, & A. Gentry (Eds.), Proceedings
of WASLI
2019. Geneva: WASLI, 74–95.
Gumul, E. (2006). Conjunctive
cohesion and the length of Ear-Voice Span in simultaneous interpreting. Linguistica
Silesiana 271, 93–103.
Han, H.-H. & Yu, H.-N. (2020). An
empirical study of temporal variables and their correlations in spoken and sign language relay
interpreting. Babel 66 (4–5), 619–635.
Hsu, W.-N., Sriram, A., Baevski, A., Likhomanenko, T., Xu, Q., Pratap, V., … Auli, M. (2021). Robust
wav2vec 2.0: Analyzing domain shift in self-supervised pre-training. arXiv.
Jurafsky, D. & Martin, J. H. (2000). Speech
and language processing: An introduction to natural language processing, computational linguistics, and speech
recognition. USA: Prentice Hall PTR.
Kiss, T. & Strunk, J. (2006). Unsupervised
multilingual sentence boundary detection. Computational
Linguistics 32 (4), 485–525.
Lamberger-Felber, H. (2017). Text-oriented
research into interpreting – Examples from a
case-study. HERMES 14 (26), 39–64.
Manning, C. D. & Schütze, H. (1999). Foundations
of statistical Natural Language Processing. Cambridge, Mass: The MIT Press.
Mellinger, C. D. & Hanson, T. (2017). Quantitative
research methods in translation and interpreting studies. London and New York: Routledge.
Montani, I., Honnibal, M., Honnibal, M., Landeghem, S. V., Boyd, A., Peters, H., … Tamura, Y. (2023). explosion/spaCy:
V3.5.2: Pretraining improvements, bug fixes for spans and spancat and
more. Zenodo.
Paneth, E. (1957). An
investigation into conference interpreting. In F. Pöchhacker & M. Shlesinger (Eds.), The
interpreting studies reader. New York: University of London/Routledge, 30–40.
Plevoets, K. & Defrancq, B. (2018). The
cognitive load of interpreters in the European Parliament. A corpus-based study of predictors for the disfluency
uh(m). Interpreting 20 (1), 1–28.
(2020). Imported
load in simultaneous interpreting: An assessment. In Multilingual
mediated communication and
cognition. London: Routledge, 18–43.
Prandi, B. (2023). Computer-assisted
simultaneous interpreting: A cognitive-experimental study on
terminology. Berlin: Language Science Press.
Qi, P., Zhang, Y., Zhang, Y., Bolton, J. & Manning, C. D. (2020). Stanza:
A Python Natural Language Processing toolkit for many human languages. Proceedings of the 58th
Annual Meeting of the Association for Computational Linguistics: System
Demonstrations. Online: Association for Computational Linguistics, 101–108.
Rabiner, L. R. (1989). A
tutorial on hidden Markov models and selected applications in speech recognition. Proceedings
of the
IEEE, 771, 257–286.
Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C. & Sutskever, I. (2022). Robust
speech recognition via large-scale weak supervision. arXiv.
Read, J., Dridan, R., Oepen, S. & Solberg, L. J. (2012). Sentence
boundary detection: A long solved problem? Proceedings of COLING 2012:
Posters. Mumbai, India: The COLING 2012 Organizing Committee, 985–994.
Reimers, N. & Gurevych, I. (2019). Sentence-BERT:
Sentence Embeddings using Siamese BERT-Networks. Proceedings of the 2019 Conference on
Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing
(EMNLP-IJCNLP), 3982–3992. Hong Kong, China: Association for Computational Linguistics.
Rosendo, L. R. & Galván, M. C. (2019). Coping
with
speed. Babel 65 (1), 1–25.
Ruder, S., Vulić, I. & Søgaard, A. (2019). A
survey of cross-lingual word embedding models. Journal of Artificial Intelligence
Research 651, 569–631.
Temnikova, I., Abdelali, A., Djabri, S. & Hedaya, S. (2019). Human-informed
speakers and interpreters analysis in the WAW corpus and an automatic method for calculating interpreters’
décalage. Proceedings of the Human-informed Translation and Interpreting Technology Workshop
(HiT-IT 2019), 105–115.
Tiedemann, J. & Thottingal, S. (2020). OPUS-MT –
Building open translation services for the World. Proceedings of the 22nd Annual Conference of
the European Association for Machine
Translation. Lisboa: European Association for Machine Translation, 479–480.
Timarová, Š. (2015). Time
lag. In F. Pӧchhacker (Ed.), Routledge
encyclopedia of interpreting
studies. London: Routledge, 418–420.
Timarová, Š., Čeňková, I., Meylaerts, R., Hertog, E., Szmalec, A. & Duyck, W. (2014). Simultaneous
interpreting and working memory executive
control. Interpreting 16 (2), 139–168.
Timarová, Š., Dragsted, B. & Gorm Hansen, I. (2011). Time
lag in translation and interpreting: A methodological
exploration. In C. Alvstad, A. Hild & E. Tiselius (Eds.), Methods
and strategies of process research: Integrative approaches in Translation Studies. John Benjamins, 121–146.
Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., … Raffel, C. (2021). mT5:
A massively multilingual pre-trained Text-to-Text Transformer. Proceedings of the 2021
Conference of the North American Chapter of the Association for Computational Linguistics: Human Language
Technologies, 483–498. Association for Computational Linguistics.
Cited by (1)
Cited by one other publication
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
