In:Developing, Modelling and Assessing Second Languages:
Edited by Jörg-U. Keßler, Anke Lenzing and Mathias Liebner
[Processability Approaches to Language Acquisition Research & Teaching 5] 2016
► pp. 207–238
Get fulltext
The cognitive processes elicited by L2 listening test tasks – A validation study
Available under the Creative Commons Attribution-NonCommercial-NoDerivatives (CC BY-NC-ND) 4.0 license.
For any use beyond this license, please contact the publisher at rights@benjamins.nl.
Published online: 29 June 2016
https://doi.org/10.1075/palart.5.10ros
https://doi.org/10.1075/palart.5.10ros
This paper is concerned with an investigation into the validity of a listening comprehension test that was developed for a large-scale assessment project. The study draws on qualitative data, employing a think-aloud technique and stimulated recall interviews. The informants (n=18) were purposefully and randomly sampled from a group (n=121) of year 9 learners (ages 14–16) of English as a foreign language (EFL) in German schools. Subjects were asked to think aloud while they were solving the multiple choice-items of the listening test. Construct-relevant and -irrelevant processes were identified and analysed with regard to their distribution across the two subsamples and their relative contribution to correct item responses. The results provide validity evidence for the listening tests in general. A few few test items, however, were shown to elicit test-taking processes and strategies that compromise the measurement outcomes.1. Introduction
References (84)
Anderson, J.R. (1995). Cognitive psychology and its implications (4th ed.). New York, NY: W.H. Freeman and Company.
Bachman, L.F. (2004). Statistical analyses for language testing. Cambridge: Cambridge University Press.
Bloom, B. (1954). The thought processes of students in discussion. In S.J. French (Ed.), Accent on teaching: Experiments in general education (pp. 23-46). New York, NY: Harper.
Borsboom, D., Cramer, A.O.J., Kievit, R.A., Zand Scholten, A., & Franic, S. (2009). The end of construct validity. In R.W. Lissitz (Ed.), The concept of validity: Revisions, new directions, and applications (pp. 135-170). Charlotte, NC: Information Age Publishers.
Borsboom, D., VanHeerden, J., & Mellenbergh, G. (2004). The concept of validity. Psychological Review, 111(4), 1061–1071.
Brindley, G. (1998). Describing language development? Rating scales and SLA. In L.F. Bachman & A.D. Cohen (Eds.), Interfaces between second language acquisition and language testing research (pp. 112–141). Cambridge: Cambridge University Press.
Bryman, A. (2006). Integrating quantitative and qualitative research: How is it done? Qualitative Research, 6(1), 97–113
Buck, G. (1991). The testing of listening comprehension: An introspective study. Language Testing, 8(1), 67-91.
. (1992). Listening comprehension: Construct validity and trait characteristics. Language Learning, 42(3), 313-357.
Campbell, D.T., & Fiske, D.W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81-105.
Chapelle, C.A. (1998). Construct definition and validity inquiry in SLA and research. In L.F. Bachman & A.D. Cohen (Eds.), Interfaces between second language acquisition and language testing research (pp. 32-70). Cambridge: Cambridge University Press.
Cohen, A.D. (1998). Strategies and processes in test taking and SLA. In L.F. Bachman & A.D. Cohen (Eds.) Interfaces between second language acquisition and language testing research (pp. 90-111). Cambridge: Cambridge University Press.
. (2000). Exploring strategies in test-taking: Fine-tuning verbal reports from respondents. In G. Ekbatani & H. Pierson (Eds.), Learner-directed assessment in ESL (pp. 127-150). Mahwah, NJ: Lawrence Erlbaum Associates.
. (2007). The coming of age for research on test-taking strategies. In J. Fox, M. Weshe, D. Bayliss, L. Cheng, C. Turner, & C. Doe (Eds.), Language testing reconsidered (pp. 80-111). Ottawa: Ottawa University Press.
Creswell, J.W., Plano Clark, V.L., Gutmann, M.L., & Hanson, W.E. (2007). Advanced mixed methods research designs. In A. Tashakkori & C. Teddlie (Eds.), Handbook of mixed methods in social & behavioral research (pp. 209–240). Thousand Oaks, CA: Sage.
Cronbach, L.J., & Meehl, P.E. (1955). Construct validity in psychological tests. Psychological Bulletin 52(1), 281-302.
Denzin, N.K. (1970). The research act: A theoretical introduction to sociological methods. Englewood Cliffs, NJ: Prentice Hall.
Denzin, N.K., & Lincoln, Y.S. (Eds.). (2000). The handbook of qualitative research (2nd ed.). Thousand Oaks, CA: Sage.
DESI-Konsortium (Ed.). (2008). Unterricht und Kompetenzerwerb in Deutsch und Englisch: Ergebnisse der DESI-Studie. Weinheim: Beltz.
Di Pardo, A. (1994). Stimulated recall in research on writing: An antidote to "I don't know, it was fine". In P. Smagorinsky (Ed.), Speaking about writing: Reflections on research methodology (pp. 163-184). Thousand Oaks, CA: Sage.
Ericsson, K.A., & Simon, H.A. (1993). Protocol analysis: Verbal reports as data (Rev. ed.). Cambridge, MA: The MIT Press.
Ericsson, K.A. (2003). Valid and non-reactive verbalisation of thoughts during performance of tasks: Toward a solution to the central problems of introspection as a source of scientific data. Journal of Consciousness Studies, 10(9-10), 1-18.
Friese, M., & Fiedler, K. (2010). Being on the lookout for validity. Experimental Psychology, 57(3), 228-232.
Gass, S.M., & Mackey, A. (2000). Stimulated recall methodology in second language research. Mahwah, NJ: Lawrence Erlbaum Associates.
Gernsbacher, M.A., & Foertsch, J.A. (1999). Three models of discourse comprehension. In S. Garrod, & M.J. Pickering (Eds.), Language processing (pp. 283–299). Hove: Psychology Press.
Graesser, A.C., Gernsbacher, M.A., & Goldman, S.R. (1997). Cognition. In T.A. van Dijk (Ed.), Discourse studies. A multidisciplinary introduction. Vol.1 (pp. 292–319). Thousand Oaks, CA: Sage.
Graesser, A.C., Singer, M., & Trabasso, T. (1994). Constructing inferences during narrative text comprehension. Psychological Review, 101, 371-95.
Graesser, A.C., Wiemer-Hastings, P., & Wiemer-Hastings, K. (2001). Constructing inferences and relations during text comprehension. In T. Sanders, J. Schilperoord, & W. Spooren (Eds.), Text representation: Linguistic and psycholinguistic aspects (pp. 249-271). Amsterdam: John Benjamins.
Green, A. (1998). Verbal protocol analysis in language testing research: A handbook. Cambridge: Cambridge University Press.
Grotjahn, R., & Eckes, T. (2006). A closer look at the construct validity of C-tests. Language Testing, 23(3), 290–325
Haastrup, K. (1987). Using thinking aloud and retrospection to uncover learners’ lexical inferencing procedures. In C. Faerch & G. Kasper (Eds.), Introspection in second language research (pp. 197-212). Clevedon: Multilingual Matters.
Haladyna, T.M., & Downing, S.M. (2004). Construct-irrelevant variance in high-stakes testing. Educational Measurement: Issues and Practice, 23(1), 17-27.
Kane, M. (2001). Current concerns in validity theory. Journal of Educational Measurement, 38(4), 319–342.
Kintsch, W., Patel, V.L., & Ericsson, K.A. (1999). The role of long-term working memory in text comprehension. Psychologia, 42, 186–198.
Kintsch, W., & van Dijk, T.A. (1978). Toward a model of text comprehension and production. Psychological Review, 85, 363–394.
Kunnan, A.J. (2000). Fairness and justice for all. In A.J. Kunnan (Ed.), Fairness and validation in language assessment (pp. 1–14). Cambridge: Cambridge University Press.
Lazarsfeld, P.F. (1960).
Latent structure analysis and test theory
. In H. Gulliksen & S. Messick (Eds.), Psychological scaling: Theory and applications (pp. 83—96). New York, NY: Wiley.
Lewins, A., & Silver, C. (2009). Using software in qualitative research: A step-by-step guide (Reprinted.). Los Angeles: Sage.
Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates.
Maxwell, J.A. (2005). Qualitative research design: An interactive approach. Thousand Oaks, CA: Sage.
McKoon, G., & Ratcliff, R. (1986). Inferences about predictable events. Journal of Experimental Psychology: Learning, Memory and Cognition, 12, 82-91.
Messick, S. (1989). Validity. In R.L. Linn (Ed.), Educational measurement (3rd ed.; pp. 13-103). New York, NY: American Council on Education & Macmillan.
. (1992). Validity of test interpretation and use. In M.C. Alkin (Ed.), Encyclopedia of educational research (pp. 88-98). New York, NY: Macmillan.
Miles, M.B., & Huberman, A.M. (2009). Qualitative data analysis: An expanded sourcebook. Thousand Oaks, CA: Sage.
Nold, G., & Rosssa, H. (2007). Hörverstehen. In B. Beck & E. Klieme (Eds.), Sprachliche Kompetenzen. Konzepte und Messung - DESI-Studie (Deutsch-Englisch-Schülerleistungen International) (pp. 178-196). Weinheim: Beltz.
Nold, G., Rossa, H., & Hartig, J. (2008). Proficiency scaling in DESI listening and reading EFL tests: Task characteristics, item difficulty and cut-off points. In L. Taylor & C.J. Weir (Eds.),
Multilingualism and assessment. Achieving transparency, assuring quality, sustaining diversity. Proceedings of the ALTE Berlin conference,
May 2005
(pp. 94–116). Cambridge: Cambridge University Press.
O'Malley, M., & Chamot, A.U. (1990). Learning strategies in second language acquisition. Cambridge: Cambridge University Press.
Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can reasonably be supposed to have arisen from random sampling. Philosophical Magazine, 5(50), 157-175.
Pienemann, M. (1998). Language processing and second language development: Processability Theory. Amsterdam: John Benjamins.
. (2003). Language processing capacity. In C. Doughty & M. Long (Eds.), The handbook of second language acquisition (pp. 679-714). Oxford: Blackwell.
Pienemann, M., & Keßler, J.-U. (2007). Measuring bilingualism. In P. Auer & L. Wei (Eds.), Handbooks of applied linguistics: Handbook of multilingualism and multilingual communication (pp. 247–278). Berlin: Mouton de Gruyter.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danmarks pædagogiske Institut.
. (1961). On general laws and the meaning of measurement in psychology. Berkeley, CA: University of California Press.
. (1968).
An individualistic approach to item analysis
. In P.F. Lazarsfeld & N.W. Henry (Eds.), Readings in mathematical social science (pp. 89–107). Cambridge, MA: The MIT Press.
. (1980). Probabilistic models for some intelligence and attainment tests (exp. ed.). Chicago, IL: University of Chicago Press.
Rogers, T.T., & McClelland, J.L. (2008). Précis of semantic cognition: A parallel distributed processing approach. Behavioral and Brain Sciences, 31(06), 689–714
Roser, M., & Gazzaniga, M.S. (2004). Automatic brains - Interpretive minds. Current Directions in Psychological Science, 13(2), 56–59.
Ross, S. (1997). An introspective analysis of listener inferencing on a second language listening test. In G. Kasper & E. Kellerman (Eds.), Communication strategies: Psycholinguistic and sociolinguistic perspectives (pp. 216-237). Harlow: Addison Wesley Longman.
Rossa, H. (2012). Mentale Prozesse beim Hörverstehen in der Fremdsprache. Eine Studie zur Validität der Messung sprachlicher Kompetenzen (Inquiries in Language Learning, Volume 5). Frankfurt: Peter Lang
Rupp, A., Ferne, T., & Choi, H. (2006). How assessing reading comprehension with multiple-choice questions shapes the construct: A cognitive processing perspective. Language Testing, 23(4), 441–474
Senécal, A. (2011). Processing the L2 comprehension process: Testing Processability Theory’s predictions in an ERP study of adult learners of L2 Swedish. Master’s thesis, Lund University. Accessed from: <[URL]>
Selting, M., Auer, P., Barden, B., Bergmann, J.R., Couper-Kuhlen, E., & Günthner, S. et al. (1998). Gesprächsanalytisches Transkriptionssystem (GAT). Linguistische Berichte, 173, 91–122
Shohamy, E. (2001). The power of tests: A critical perspective on the uses of language tests. Harlow: Pearson.
Smith, J.A. (Ed.). (2003). Qualitative psychology: A practical guide to research methods. London: Sage.
Stoynoff, S. (2009). Recent developments in language assessment and the case of four large-scale tests of ESOL ability. Language Teaching, 42(1), 1–40.
Tirkkonen-Condit, S. (1991). Relational propositions in text comprehension processes. In K. Sajavaara (Ed.), Communication and discourse across cultures and languages. In AFinLA Yearbook 1990 (pp. 239–246). Jyväskylä: University of Jyväskylä.
Van der Veen, A., Huff, K., Gierl, M., McNamara, D.D., Louwerse, M., & Graesser, A. (2007). Developing and validating instructionally relevant reading competency profiles measured by the critical reading section of the SAT reasoning test. In D.S. McNamara (Ed.), Reading comprehension strategies. Theories, interventions, and technologies (pp. 137-172J). New York, NY: Lawrence Erlbaum Associates.
Van Someren, M.W., Barnard, Y.F., & Sandberg, J.A. (1994). The think aloud method: A practical guide to modelling cognitive processes. London: Academic Press.
Vandergrift, L. (2003). Orchestrating strategy use: Toward a model of the skilled second language listener. Language Learning, 53(3), 463-496.
Webb, E., Campbell, D.T., Schwartz, R.D., & Sechrest, L. (1962). Unobtrusive measures: Nonreactive measures in the social sciences. Chicago, IL: Rand McNally.
Cited by (1)
Cited by one other publication
This list is based on CrossRef data as of 28 november 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
