Article published In: Australian Review of Applied Linguistics
Vol. 17:2 (1994) ► pp.77–103
Patterns of rater behaviour in the assessment of an oral interaction test
Published online: 1 January 1994
https://doi.org/10.1075/aral.17.2.04wig
https://doi.org/10.1075/aral.17.2.04wig
Abstract
Lack of inter-rater agreement in the assessment of oral tests is wellknown. In this paper, multi-faceted Rasch analysis was used to determine whether any bias was evident in the way a group of raters (N=13) rated two different versions of an oral interaction test, undertaken by the same candidates (N=83) under the two conditions – direct and semi-direct. Rasch measurement allows analysis of the interaction between ‘facets’; in this case, raters, items and candidates are all facets. In this study, the interaction between rater and item was investigated in order to determine whether particular tasks in the test were scored in a consistently biased way by particular raters. The results of the analysis indicated that certain raters consistently assessed the tape-version of the test more harshly whilst others consistently rated the live version more harshly. This type of approach also allowed a finer analysis at the level of individual items with respect to harshness and consistency across ratings. The implications for rater training and feedback are discussed.
References (10)
Brown, A. (1993) The effect of rater variables in the development of an occupation-specific language performance test. Paper presented at the annual Language Testing Research Colloquium, Cambridge, U.K., August 2-5
Elder, C. (1993) Are rater judgements of teacher effectiveness wholly language based. Paper presented at the annual Language Testing Research Colloquium, Cambridge, U.K., August 2-5
Fayer, J.M. and E. Krasinski (1987) Native and nonnative judgements of intelligibility and irritation. Language Learning 37,3:313–326
Linacre, J A M. (1992a) FACETS computer program for many faceted Rasch measurement (version 2.62). Chicago IL: Mesa Press
McNamara, T. (1990) Item Response Theory and the validation of an ESP test for health professionals, Language Testing 71:52–75
(in preparation) Second Language Performance Assessment. Unpublished manuscript.
Shohamy, E., C.M. Gordon and R. Kraemer (1992) The effect of rater’ background and training on the reliability of direct writing test. The Modern Language Journal 76,1:27–33
Stahl, J. and Lunz, M. (1992) Judge Performance Reports. Paper presented at AERA, San Francisco, April
Cited by (17)
Cited by 17 other publications
ŞEVGİN, Hikmet & Mehmet ŞATA
Walsh, Matthew M., Burcu Arslan & Bridgid Finn
Finn, Bridgid, Burcu Arslan & Matthew Walsh
Finn, Bridgid, Cathy Wendler, Kathryn L. Ricker‐Pedley & Burcu Arslan
Goh, Christine C. M. & Hui Teng Ang-Aw
Wesolowski, Brian C., Stefanie A. Wind & George Engelhard
Tajeddin, Zia & Minoo Alemi
Winke, Paula & Susan Gass
Winke, Paula
2014. Chapter 10. Formative, task-based oral assessments in an advanced Chinese-language class. In Technology-mediated TBLT [Task-Based Language Teaching, 6], ► pp. 263 ff.
Ang-Aw, Hui Teng & Christine Chuen Meng Goh
Brown, James Dean & Russell Changseob Ahn
Winke, Paula, Susan Gass & Carol Myford
Winke, Paula, Susan Gass & Carol Myford
Schaefer, Edward
Kondo-Brown, Kimi
This list is based on CrossRef data as of 14 november 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
