In:Research Methods in Cognitive Translation and Interpreting Studies
Edited by Ana María Rojo López and Ricardo Muñoz Martín
[Research Methods in Applied Linguistics 10] 2025
► pp. 183–212
Chapter 8Speech recording
Published online: 1 April 2025
https://doi.org/10.1075/rmal.10.08ahr
https://doi.org/10.1075/rmal.10.08ahr
Abstract
Speech recording has always been the main method in working with spoken language, since it is an easy way to transfer data
from volatile to non-volatile memory for the purpose of linguistic analyses. In translation and interpreting research, recordings of
interpreters’ performances have been used ever since the beginning of empirical research into interpreting. This chapter offers an
overview on speech recording as a method used in T&I research by introducing key concepts and software solutions for research into
spoken language. Not only are common variables, paradigms and processing steps outlined, but also an overview is provided of some
noteworthy programs for speech recording, audio editing, speech analysis, automatic speech recognition and others.
Article outline
- 1.The method and key questions
- 1.1The state of the art in speech recording
- 1.2Ethical issues in speech recording
- 2.Conceptual aspects
- 2.1Variables in speech recording
- 2.2Data measurement in speech recording
- 2.3Pre-processing steps enabling analysis
- 2.3.1Speech recognition
- 2.3.2Alignment
- 3.Implementation
- 3.1Speech recording applications
- 3.1.1Open source audio editors
- 3.1.2Speech analysis software
- 3.1.3Suites of programs
- 3.1.4Automatic speech recognition tools
- 3.1.5Other tools
- 3.2Data collection and reporting
- 3.2.1Data collection sources
- 3.2.2Data triangulation
- 3.2.3Data reporting
- 3.1Speech recording applications
- 4.Closing remarks
- 4.1Advantages and disadvantages of speech recording
- 4.2Emerging challenges in speech recording
Further readings on speech recording References
References (89)
Bernardini, S., Ferraresi, A., Russo, M., Collard, C., & Defrancq, B. (2018). Building
interpreting and intermodal corpora: A how-to for a formidable
task. In M. Russo, C. Bendazzoli, & B. Defrancq (Eds.), Making
way in corpus-based interpreting
studies (pp. 21–42). Springer Singapore.
Binnenpoorte, D. M. (2006). Phonetic
transcriptions of large speech corpora (Doctoral dissertation). Radboud University Nijmegen. [URL]
Decker, P. D., & Nycz, J. (2011). For
the record: Which digital media can be used for sociophonetic analysis? University of Pennsylvania
Working Papers in
Linguistics, 17(2), 51–59.
(2005). Prosodic
phenomena in simultaneous interpreting: A conceptual approach and its practical
application. Interpreting, 7(1), 51–76.
Alessandrini, M. S. (1990). Translating
numbers in consecutive interpretation: An experimental study. The Interpreters’
Newsletter, 3, 77–80.
Audacity Team. (2022). Audacity(R): Free audio
editor and recorder (3.1.3) [Computer software]. [URL]
Barghout, A., Rosendo, L. R., & García, M. V. (2015). The
influence of speed on omissions in simultaneous interpretation: An experimental
study. Babel, 61(3), 305–334.
Barranco-Droege, R. (2015). Too
fast to be true? Exploring time compression in simultaneous interpreting. Speech
Communication, 75, 84–96.
Boersma, P., & Weenink, D. (2024). Praat:
Doing phonetics by computer (6.4.12) [Computer software]. [URL]
Cecot, M. (2001). Pauses
in simultaneous interpretation: A contrastive analysis of professional interpreters’ performance. The
Interpreters’
Newsletter, 11, 63–85.
Chen, S. (2017). The
construct of cognitive load in interpreting and its
measurement. Perspectives, 25(4), 640–657.
Chmiel, A., Koržinek, D., Kajzer-Wietrzny, M., Janikowski, P., Jakubowski, D., & Polakowska D. (2022). Fluency
parameters in the Polish Interpreting Corpus (PINC). In M. Kajzer-Wietrzny, A. Ferraresi, I. Ivaska, & S. Bernardini (Eds.), Mediated
discourse at the European Parliament: Empirical
investigations (pp. 63–91). Language Science Press.
Collados Aís, A. (1998). La
evaluación de la calidad en interpretación simultánea: La importancia de la comunicación no
verbal. Comares.
Collados Aís, Á., Iglesias Fernández, E., Pradas Macías, E. M., & Stévaux, E. (Eds.). (2011). Qualitätsparameter
beim Simultandolmetschen: Interdisziplinäre Perspektiven. Gunter Narr.
Collard, C., & Defrancq, B. (2020). Disfluencies
in simultaneous interpreting, a corpus-based study with special reference to
sex. In L. Vandevoorde, J. Daems, & B. Defrancq (Eds.), New
empirical perspectives on translation and
interpreting (pp. 264–299). Routledge.
Defrancq, B. (2015). Corpus-based
research into the presumed effects of short
EVS. Interpreting, 17(1), 26–45.
Defrancq, B., & Fantinuoli, C. (2020). Automatic
speech recognition in the booth: Assessment of system performance, interpreters’ performances and interactions in the context of
numbers. Target, 33(1), 73–102.
Díaz-Galaz, S., Padilla, P., & Bajo, M. T. (2015). The
role of advance preparation in simultaneous interpreting: A comparison of professional interpreters and interpreting
students. Interpreting, 17(1), 1–25.
Dittmar, N. (2004). Transkription.
Ein Leitfaden mit Aufgaben für Studenten, Forscher und Laien. VS Verlag für Sozialwissenschaften.
Ehrensberger-Dow, M., Albl-Mikasa, M., Andermatt, K., Hunziker Heeb, A., & Lehr, C. (2020). Cognitive
load in processing ELF: Translators, interpreters, and other multilinguals. Journal of English as a
Lingua
Franca, 9(2), 217–238.
EXMARaLDA. (2023). [Computer
software]. [URL]
Gerver, D. (1969). The
effects of source language presentation rate on the performance of simultaneous conference
interpreters. In Proceedings of the Second Louisville Conference on Rate
and/or Frequency-Controlled
Speech (pp. 162–184). Center for Rate-Controlled Recordings, University of Louisville.
(1976). Empirical
studies of simultaneous interpretation: A review and a model. In R. W. Brislin (Ed.), Translation:
Applications and
research (pp. 165–207). Gardner.
Gieshoff, A. C. (2018). The
impact of audio-visual speech input on work-load in simultaneous interpreting (Unpublished doctoral
dissertation). Johannes Gutenberg-University Mainz.
(2021). Does
it help to see the speaker’s lip movements? An investigation of cognitive load and mental effort in simultaneous
interpreting. Translation, Cognition &
Behavior, 4(1), 1–25.
Gile, D. (1995). Regards
sur la recherche en interprétation de conference. Presses Universitaires de Lilles.
(2008). Local
cognitive load in simultaneous interpreting and its implications for empirical
research. FORUM, 6(2), 59–77.
(2009). Basic
concepts and models for interpreter and translator training (rev. ed.). John Benjamins.
(1961). The
significance of changes in the rate of articulation. Language and
Speech, 4(3), 171–174.
(1967). Sequential
temporal patterns and cognitive processes in speech. Language and
Speech, 10(2), 122–132.
Grbić, N. (2015). Quality. In F. Pöchhacker (Ed.), Routledge
encyclopedia of interpreting
studies (pp. 333–336). Routledge.
Grosjean, F., & Deschamps, A. (1975). Analyse
contrastive des variables temporelles de l’anglais et du français: Vitesse de parole et variables composantes, phénomènes
d’hésitation. Phonetica, 31, 144–184.
Gumul, E. (2020). Explicitation
and cognitive load in simultaneous interpreting: Product- and process-oriented analysis of trainee interpreters’
outputs. Interpreting, 23(1).
Haberl, A., Fleiß, J., Kowald, D., & Thalmann, S. (2024). Take
the aTrain. Introducing an interface for the accessible transcription of interviews. Journal of
Behavioral and Experimental
Finance, 41, 1–7.
Han, C., & Riazi, M. (2017). Investigating
the effects of speech rate and accent on simultaneous interpretation: A mixed-methods approach. Across
Languages and
Cultures, 18(2), 237–259.
Hervais-Adelman, A., & Babcock, L. (2019). The
neurobiology of simultaneous interpreting: Where extreme language control and cognitive control
intersect. Bilingualism: Language and
Cognition, 23(4), 740–751.
Holub, E. (2010). Does
intonation matter? The impact of monotony on listener comprehension. The Interpreters’
Newsletter, 15, 117–126.
Holzman, P. S., Berger, A., & Rousey, C. (1967). Voice
confrontation: A bilingual study. Journal of Personality and Social
Psychology, 7(4), 423–428.
Horváth, I. (2022). AI
in interpreting: Ethical considerations. Across Languages and
Cultures, 23(1), 1–13.
Hönig, H. G. (2003). Piece
of cake — Or hard to take? In B. Nord & P. A. Schmitt (Eds.), Traducta
Navis. Festschrift zum 60. Geburtstag von Christiane
Nord (pp. 69–82). Stauffenburg.
Injoque-Ricle, I., Barreyro, J. P., Formoso, J., & Jaichenco, V. I. (2015). Expertise,
working memory and articulatory suppression effect: Their relation with simultaneous interpreting
performance. Advances in Cognitive
Psychology, 11(2), 56–63.
Institut für Phonetik und Sprachverarbeitung, Ludwig-Maximilians-Universität
München. (2022). BAS | web service interface. [URL]
Kalina, S. (1998). Strategische
Prozesse beim Dolmetschen: Theoretische Grundlagen, empirische Untersuchungen, didaktische
Konsequenzen. Gunter Narr.
Koržinek, D. (2020). Corrector [HTML]. [URL] (Original work
published 2017).
Kowal, S., & O’Connell, D. C. (2003). Datenerhebung
und Transkription. In G. Rickheit, T. Herrmann, & W. Deutsch (Eds.), Psycholinguistik/Psycholinguistics:
Ein Internationales Handbuch / An international
handbook (pp. 92–106). De Gruyter.
(2007). Zur
Transkription von Gesprächen. In U. Flick, E. von Kardorff, & I. Steinke (Eds.), Qualitative
Forschung. Ein Handbuch (5th
ed., pp. 437–447). Rowohlt.
Lal Srivastava, B. M., Vauquier, N., Sahidullah, M., Bellet, A., Tommasi, M., & Vincent, E. (2020). Evaluating
voice conversion-based privacy protection against informed attackers. ICASSP 2020 – 2020 IEEE
International Conference on Acoustics, Speech and Signal Processing
(ICASSP), 2802–2806.
Lambert, S. (2004). Shared
attention during sight translation, sight interpretation and simultaneous
interpretation. Meta, 49(2), 294–306.
Lehtonen, M. H., Laine, M., Niemi, J., Thomsen, T., Vorobyev, V. A., & Hugdahl, K. (2005). Brain
correlates of sentence translation in Finnish-Norwegian
bilinguals. NeuroReport, 16(6), 607–610.
Li, X. (2013). Are
interpreting strategies teachable? Correlating trainees’ strategy use with trainers’ training in the consecutive interpreting
classroom. The Interpreters’
Newsletter, 18, 105–128.
Liang, J., Fang, Y., Lv, Q., & Liu, H. (2017). Dependency
distance differences across interpreting types: Implications for cognitive demand. Frontiers in
Psychology, 8, 2132.
Lin, Y., Lv, Q., & Liang, J. (2018). Predicting
fluency with language proficiency, working memory, and directionality in simultaneous
interpreting. Frontiers in
Psychology, 9, 1543.
Lu, F., & Yuan, Z. (2019). Explore
the brain activity during translation and interpreting using functional near-infrared
spectroscopy. In D. Li, V. L. C. Lei, & Y. He (Eds.), Researching
cognitive processes of
translation (pp. 109–120). Springer Singapore.
Mellinger, C. D., & Hanson, T. A. (2019). Meta-analyses
of simultaneous interpreting and working
memory. Interpreting, 21(2), 165–195.
Moser-Mercer, B. (2015). Expert-novice
paradigm. In F. Pöchhacker (Ed.), Routledge
encyclopedia of interpreting
studies (p. 155). Routledge.
(2022). Conference
interpreting and expertise. In M. Albl-Mikasa & E. Tiselius (Eds.), The
Routledge handbook of conference
interpreting (pp. 386–400). Routledge.
Nautsch, A., Jasserand, C., Kindt, E., Todisco, M., Trancoso, I., & Evans, N. (2019). The
GDPR & speech data: Reflections of legal and technology communities, first steps towards a common
understanding. Proceedings
Interspeech 2019, 3695–3699.
Nguyen, N. B. (2016). Piloting
an assessment model of interpreting quality. VNU Journal of Science: Foreign
Studies, 32(4), 12–20.
O’Shaughnessy, D. (1993). Analysis
and automatic recognition of false starts in spontaneous speech. IEEE International Conference on
Acoustics, Speech and Signal Processing, 724–727.
Polzehl, T., Schmitt, A., Metze, F., & Wagner, M. (2011). Anger
recognition in speech using acoustic and linguistic cues. Speech
Communication, 53(9), 1198–1209.
Pucher, M., & Woltron, T. (2021). Conversion
of airborne to bone-conducted speech with deep neural networks. Proceedings
Interspeech 2021, 1–5.
Rennert, S. (2010). The
impact of fluency on the subjective assessment of interpreting quality. The Interpreters’
Newsletter, 15, 101–115.
Schönherr, B. (1997). Syntax
— Prosodie — nonverbale Kommunikation: Empirische Untersuchungen zur Interaktion sprachlicher und parasprachlicher Ausdrucksmittel im
Gespräch. Max Niemeyer.
Seeber, K. G. (2011). Cognitive
load in simultaneous interpreting: Existing theories — New
models. Interpreting, 13(2), 176–204.
(2015). Cognitive
load. In F. Pöchhacker (Ed.), Routledge
encyclopedia of interpreting
studies (pp. 60–61). Routledge.
Seleskovitch, D. (1968). L’interprète
dans les conférences internationales: Problèmes de langage et de communication. Minard Lettres Modernes.
Shao, Z., & Chai, M. (2020). The
effect of cognitive load on simultaneous interpreting performance: An empirical study at the local
level. Perspectives, 29(5), 778–794.
Shlesinger, M., & Malkiel, B. (2005). Comparing
modalities: Cognates as a case in point. Across Languages and
Cultures, 6(2), 173–193.
Siegmann, A. W. (1982). Vokale
Signale der Angst. In K. R. Scherer (Ed.), Vokale
Kommunikation. Nonverbale Aspekte des
Sprachverhaltens (pp. 343–363). Beltz.
Stachowiak-Szymczak, K., & Korpal, P. (2019). Interpreting
accuracy and visual processing of numbers in professional and student interpreters: An eye-tracking
study. Across Languages and
Cultures, 20(2), 235–251.
Stewart, C., Vogler, N., Hu, J., Boyd-Graber, J., & Neubig, G. (2018). Automatic
estimation of simultaneous interpreter performance. In I. Gurevych & Y. Miyao (Eds.), Proceedings
of the 56th Annual Meeting of the Association for Computational Linguistics, Volume 2: Short
papers (pp. 662–666). Association for Computational Linguistics.
Timarová, Š., Čeňková, I., Meylaerts, R., Hertog, E., Szmalec, A., & Duyck, W. (2014). Simultaneous
interpreting and working memory executive
control. Interpreting, 16(2), 139–168.
Timarová, Š., Dragsted, B., & Hansen, I. G. (2011). Time
lag in translation and interpreting: A methodological exploration. In C. Alvstad, A. Hild, & E. Tiselius (Eds.), Methods
and strategies of process research integrative approaches in translation
studies (pp. 121–146). John Benjamins.
Tiselius, E. (2013). Experience
and expertise in conference interpreting: An investigation of Swedish conference interpreters (Unpublished
doctoral dissertation). University of Bergen.
Tissi, B. (2000). Silent
pauses and disfluencies in simultaneous interpretation: A descriptive analysis. The Interpreters’
Newsletter, 10, 103–127.
Viezzi, M. (1989). Information
retention as a parameter for the comparison of sight translation and simultaneous interpretation: An experimental
study. The Interpreters’
Newsletter, 2, 65–69.
Vondřička, P. (2014). Aligning
parallel texts with InterText. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings
of the Ninth International Conference on Language Resources and Evaluation
(LREC’14) (pp. 1875–1879). European Language Resources Association.
Williams, J., Pizzi, K., Das, S., & Noe, P.-G. (2022). New
challenges for content privacy in speech and audio. In Proceedings of 2nd Symposium on Security and
Privacy in Speech
Communication (pp. 1–6).
Zellner, B. (1994). Pauses
and the temporal structure of speech. In E. Keller (Ed.), Fundamentals
of speech synthesis and speech
recognition (pp. 41–62). Wiley.
Zheng, B., & Xiang, X. (2013). Processing
metaphorical expressions in sight translation: An empirical-experimental
research. Babel, 59(2), 160–183.
