Article published In: Interpreting
Vol. 18:2 (2016) ► pp.225–252
Test validation in interpreter certification performance testing
An argument-based approach
Published online: 1 November 2016
https://doi.org/10.1075/intp.18.2.04han
https://doi.org/10.1075/intp.18.2.04han
Over the past decade, interpreter certification performance testing has gained momentum. Certification tests often involve high
stakes, since they can play an important role in regulating access to professional practice and serve to provide a measure of
professional competence for end users. The decision to award certification is based on inferences from candidates’ test scores
about their knowledge, skills and abilities, as well as their interpreting performance in a given target domain. To justify the
appropriateness of score-based inferences and actions, test developers need to provide evidence that the test is valid and
reliable through a process of test validation. However, there is little evidence that test qualities are systematically evaluated
in interpreter certification testing. In an attempt to address this problem, this paper proposes a theoretical argument-based
validation framework for interpreter certification performance tests so as to guide testers in carrying out systematic validation
research. Before presenting the framework, validity theory is reviewed, and an examination of the argument-based approach to
validation is provided. A validity argument for interpreter tests is then proposed, with hypothesized validity evidence. Examples
of evidence are drawn from relevant empirical work, where available. Gaps in the available evidence are highlighted and
suggestions for research are made.
References (100)
Al-Khanji, R., El-Shiyab, S. & Hussein, R. (2000). On the use of compensatory strategies in simultaneous interpretation. Meta 45 (3), 548–557.
ALTA Language Services. (2007). Study of California’s court interpreter certification and registration testing. [URL] (accessed 10 June 2015).
American Psychological Association, American Educational Research Association, & National Council on Measurement in Education. (1966). Standards for educational and psychological tests and manuals. Washington, DC: Author.
Angelelli, C. (2004). Revisiting the interpreter’s role: A study of conference, court, and medical interpreters in Canada, Mexico, and the United States. Amsterdam: John Benjamins.
. (2009). Using a rubric to assess translation ability: Defining the construct. In C. Angelelli & H.E. Jacobson (Eds.), Testing and assessment in translation and interpreting studies. Amsterdam: John Benjamins, 13–47.
Arocha, I.S. & Joyce, L. (2013). Patient safety, professionalization, and reimbursement as primary drivers for National Medical Interpreter Certification in the United States. Translation & Interpreting 5 (1), 127–142.
Bachman, L.F. & Palmer, A.S. (1996). Language testing in practice. Oxford, UK: Oxford University Press.
. (2010). Language assessment in practice: Developing language assessments and justifying their use in the real world. Oxford, UK: Oxford University Press.
Bachman, L.F. (1990). Fundamental considerations in language testing. Oxford, UK: Oxford University Press.
. (2003). Constructing an assessment use argument and supporting claims about test taker-assessment task interactions in evidence-centered assessment design. Measurement: Interdisciplinary Research and Perspectives 1 (1), 63–65.
Bachman, L.F., Davidson, F. & Milanovic, M. (1996). The use of test method characteristics in the content analysis and design of EFL proficiency tests. Language Testing 13 (2), 125–150.
Bond, T.G. & Fox, C.M. (2007). Applying the Rasch model: Fundamental measurement in the human sciences (2nd ed.). London: Lawrence Erlbaum.
Brennan, R.L. (1998). Misconceptions at the intersection of measurement theory and practice. Educational Measurement: Issues and Practice 17 (1), 5–9.
. (2010). Generalizability theory and classical test theory. Applied Measurement in Education 24 (1), 1–21.
Briggs, D.C. (2004). Comment: Making an argument for design validity before interpretive validity. Measurement: Interdisciplinary Research and Perspectives 2 (3), 171–174.
Campbell, S. & Hale, S. (2003). Translation and interpreting assessment in the context of educational measurement. In G. Anderman & M. Rogers (Eds.), Translation today: Trends and perspectives. Clevedon: Multilingual Matters, 205–224.
Cardinet, J., Johnson, S. & Pini, G. (2010). Applying generalizability theory using EduG. New York: Routledge.
Certification Commission for Healthcare Interpreters. (2010). Job task analysis study and results. [URL] (accessed 22 May 2015).
Chain News Service. (2013). 全国翻译资格考试报考人数近5万 入围中大型考试. [CATTI becomes one of large and medium-scale tests in China, with nearly 50,0000 registered candidates]. [URL] (accessed 10 June 2015).
Chapelle, C.A. (2008). The TOEFL validity argument. In C. Chapelle, M. Enright, & J. Jamieson (Eds.), Building a validity argument for the Test of English as a Foreign Language. London: Routledge, 319–352.
. (2012). Validity argument for language assessment: The framework is simple… Language Testing 29 (1), 19–27.
Chapelle, C.A., Enright, M.E. & Jamieson, J. (2010). Does an argument-based approach to validity make a difference? Educational Measurement: Issues and Practice 29 (1), 3–13.
Chen, J. (2009). Authenticity in accreditation tests for interpreters in China. The Interpreter and Translator Trainer 3 (2), 257–273.
. (2011). Language assessment: Its development and future – An interview with Lyle F. Bachman. Language Assessment Quarterly 8 (3), 277–290.
Christoffels, I.K., De Groot, A.M.B. & Waldorp, L.J. (2003). Basic skills in a complex task: A graphical model relating memory and lexical retrieval to simultaneous interpreting. Bilingualism: Language and Cognition 6 (3), 201–211.
Clifford, A. (2005). Putting the exam to the test: Psychometric validation and interpreter certification. Interpreting 7 (1), 97–13.
Cronbach, L.J. & Meehl, P.E. (1955). Construct validity in psychological tests. Psychological Bulletin 52 (4), 281–302.
Cronbach, L.J. (1971). Test validation. In R.L. Thorndike (Ed.) Educational measurement (2nd ed.). Washington, DC: American Council on Education, 443–507.
Crooks, T.J., Kane, M.T. & Cohen, A.S. (1996). Threats to the valid use of assessments. Assessment in Education: Principles, Policy & Practice 3 (3), 265–286.
Darò, V., Lambert, S. & Fabbro, F. (1996). Conscious monitoring of attention during simultaneous interpretation. Interpreting 1 (1), 101–124.
Dawrant, A., & Jiang, H. (2001). Conference interpreting in Mainland China. Communicate! [URL] (accessed 10 June 2015).
De Groot, A.M.B. (2000). A complex-skill approach to translation and interpreting. In S. Tirkkonen-Condit & R. Jääskeläinen (Eds.), Tapping and mapping the processes of translation and interpreting. Amsterdam: John Benjamins, 53–68.
Eckes, T. (2011). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments. Frankfurt: Peter Lang.
Embretson, S.E. (1983). Construct validity: Construct representation versus nomothetic span. Psychological Bulletin 93 (1), 179–197.
Feuerle, L. (2013). Testing interpreters: Developing, administering, and scoring court interpreter certification exams. Translation & Interpreting 5 (1), 80–93.
Fulcher, G., Davidson, F. & Kemp, J. (2011). Effective rating scale development for speaking tests: Performance decision trees. Language Testing 28 (1), 5–29.
Giambruno, S. (2013). EU member states country profiles: The current state of affairs in Europe. [URL] (accessed 10 June 2015).
Gile, D. (1995). Basic concepts and models for interpreter and translator training. Amsterdam: John Benjamins.
Guion, R. (1977). Content validity: The source of my discontent. Applied Psychological measurement 1 (1), 1–10.
Hale, S. (2004). The discourse of court interpreting: Discourse practices of the law, the witness and the interpreter. Amsterdam: John Benjamins.
Hale, S., Garcia, I., Hlavac, J., Kim, M., Lai, M., Turner, B. & Slatyer, H. (2012). Development of a conceptual overview for a new model for NAATI standards, testing and assessment. Sydney, Australia. [URL] (accessed 22 May 2015).
Han, C. (2015). Investigating rater severity/leniency in interpreter performance testing: A multifaceted Rasch measurement approach. Interpreting 17 (2), 255–283.
Hlavac, J. (2013). A cross-national overview of translator and interpreter certification procedures. Translation & Interpreting 51, 32–65.
Jacobs, E.A., Lauderdale, D.S., Meltzer, D., Shorey, J.M., Levinson, W. & Thisted, R.A. (2001). Impact of interpreter services on delivery of health care to limited-English-proficient patients. Journal of General Internal Medicine 16 (7), 468–474.
Kane, M.T. (1990). An argument-based approach to validation. Iowa City, Iowa: American College Testing Program.
. (1994).Validating interpretive arguments for licensure and certification examinations. Evaluation and the Health Professions 17 (2), 133–159.
. (2004). Certification testing as an illustration of argument-based validation. Measurement: Interdisciplinary Research and Perspectives 2 (3), 135–170.
. (2006). Validation. In R.L. Brennan (Ed.), Educational measurement (4th ed.). Westport, CT: American Council on Education/Praeger, 17–64.
Kane, M.T., Crooks, T. & Cohen, A. (1999). Validating measures of performance. Educational Measurement: Issues and Practice 18 (2), 5–17.
Kunnan, A.J. (2010). Test fairness and Toulmin’s argument structure. Language Testing 27 (2), 183–189.
Linn, R.L. (1989). Educational measurement (3rd ed.). New York: American Council on Education and Macmillan.
Lissitz, R.W. & Samuelsen, K. (2007). A suggested change in terminology and emphasis regarding validity and education. Educational Researcher 36 (8), 437–448.
Liu, M. (2013). Design and analysis of Taiwan’s interpretation certification examination. In D. Tsagari & R. van Deemter (Eds.), Assessment issues in language translation and interpreting. Frankfurt: Peter Lang, 163–178.
. (2015a). Assessment. In F. Pöchhacker (Ed.), Routledge encyclopedia of interpreting studies. New York: Routledge, 20–22.
. (2015b). Certification. In F. Pöchhacker (Ed.), Routledge encyclopedia of interpreting studies. New York: Routledge, 45–46.
Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports 3 (3), 635–694.
MacCorquodale, K. & Meehl, P.E. (1948). On a distinction between hypothetical constructs and intervening variables. Psychological Review 551, 97–105.
Mehrens, W.A. (1992). Using performance assessment for accountability purposes. Educational Measurement: Issues and Practice 11 (1), 3–9.
Messick, S. (1975). The standard problem: Meaning and values in measurement and evaluation. American Psychologist 30 (10), 955–966.
. (1988). The once and future issues of validity. Assessing the meaning and consequences of measurement. In H. Wainer & H. Braun (Eds.), Test validity. Hillsdale, NJ: Lawrence Erlbaum Associates, 33–45.
. (1989). Validity. In R.L. Linn (Ed.), Educational measurement (3rd ed.). New York: American Council on Education and Macmillan, 13–103.
. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher 23 (2), 13–23.
Meuleman, C. & Van Besien, F. (2009). Coping with extreme speech conditions in simultaneous interpreting. Interpreting 11 (1), 20–34.
Mislevy, R.J., Almond, R.G. & Lukas, J. (2004). A brief introduction to evidence-centered design (CSE Technical Report 632). [URL] (accessed 10 June 2015).
Mislevy, R.J., Steinberg, L.S. & Almond, R.G. (2003). On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives 1 (1), 3–66.
Moser, B. (1978). Simultaneous interpretation: A hypothetical model and its practical application. In D. Gerver & H.W. Sinaiko (Eds.), Language, interpretation and communication. New York/London: Plenum Press, 353–368.
Office of China Accreditation Tests for Translators and Interpreters. (2005). 二级口译英语同声传译类考试大纲. 外文出版社 [Syllabus of CATTI Level-two Simultaneous Interpreting Test]. Beijing: Foreign Languages Press.
Pöllabauer, S. (2004). Interpreting in asylum hearings: Issues of role, responsibility and power. Interpreting 6 (2), 143–180.
PSI Services LLC. (2010). Development and validation of oral and written examinations for medical interpreter certification: Technical report. Burbank, California, USA. [URL] (accessed 22 May 2015).
. (2013). Development and validation of oral examinations for Medical Interpreter Certification: Mandarin, Russian, Cantonese, Korean, and Vietnamese forms. [URL] (accessed 22 May 2015).
Ra, S. & Napier, J. (2013). Community interpreting: Asian language interpreters’ perspectives. Translation & Interpreting 5 (2), 45–61.
Roat, C.E. (2006). Certification of health care interpreters in the United States: A primer, a status report and considerations for national certification. Los Angeles, CA. [URL] (accessed 22 May 2015).
Russell, D. & Malcolm, K. (2009). Assessing ASL–English interpreters: The Canadian model of national certification. In C.V. Angelelli & H.E. Jacobson (Eds.), Testing and assessment in translation and interpreting studies: A call for dialogue between research and practice. Amsterdam: John Benjamins, 331–376.
Sawyer, D.B. (2004). Fundamental aspects of interpreter education: Curriculum and Assessment. Amsterdam: John Benjamins.
Schumacker, R. & Lomax, R.G. (2010). A beginner’s guide to structural equation modeling (3rd ed.). New York & London: Routledge.
Setton, R. (1999). Simultaneous Interpretation: A Cognitive and Pragmatic Analysis. Amsterdam: John Benjamins.
. (2009). Introduction: Interpreting China, interpreting Chinese. Interpreting 11 (2), 109–117.
Shavelson, R.J., Baxter, G.P. & Gao, X. (1993). Sampling variability of performance assessments. Journal of Educational Measurement 30 (3), 215–232.
Skinner, B.F. (1945). The operational analysis of psychological terms. Psychological Review 52 (5), 270–277.
Slatyer, H., Elder, C., Hargreaves, M. & Luo, K. (2008). An investigation into rater reliability, rater behavior and comparability of test tasks. Sydney: Access Macquarie.
Turner, B., Lai, M. & Huang, N. (2010). Error deduction and descriptors – a comparison of two methods of translation test assessment. Translation & Interpreting 2 (1), 11–23.
Vermeiren, H., Gucht, J.V. & De Bontridder, L. (2009). Standards as critical success factors in assessments: Certifying social interpreters in Flanders, Belgium. In C.V. Angelelli & H.E. Jacobson (Eds.), Testing and assessment in translation and interpreting studies: A call for dialogue between research and practice. Amsterdam: John Benjamins, 291–330.
Xie, Q. & Andrews, S. (2012). Do test design and uses influence test preparation? Testing a model of washback with Structural Equation Modeling. Language Testing 30 (1), 49–70.
Yu, D.R. (2005). T&I labor market in China. Sydney, Australia. [URL] (accessed 22 May 2015).
Cited by (11)
Cited by 11 other publications
Han, Chao, Mengting Jiang & Qionglu Chen
Guo, Wei, Xun Guo, Junkang Huang & Sha Tian
Song, Shuxian & Dechao Li
Zhang, Yifan & Vahid Aryadoust
Han, Chao & Xiaolei Lu
Han, Chao
2018. Using rating scales to assess interpretation. Interpreting. International Journal of Research and Practice in Interpreting 20:1 ► pp. 63 ff.
Han, Chao
Han, Chao
Han, Chao
[no author supplied]
This list is based on CrossRef data as of 12 march 2026. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
