Patterns of rater behaviour in the assessment of an oral interaction test

Wigglesworth, Gillian

doi:10.1075/aral.17.2.04wig

Article published In: Australian Review of Applied Linguistics
Vol. 17:2 (1994) ► pp.77–103

Get fulltext from our e-platform

Download PDF

Patterns of rater behaviour in the assessment of an oral interaction test

Gillian Wigglesworth | University of Melbourne

Published online: 1 January 1994

https://doi.org/10.1075/aral.17.2.04wig

Abstract

Lack of inter-rater agreement in the assessment of oral tests is wellknown. In this paper, multi-faceted Rasch analysis was used to determine whether any bias was evident in the way a group of raters (N=13) rated two different versions of an oral interaction test, undertaken by the same candidates (N=83) under the two conditions – direct and semi-direct. Rasch measurement allows analysis of the interaction between ‘facets’; in this case, raters, items and candidates are all facets. In this study, the interaction between rater and item was investigated in order to determine whether particular tasks in the test were scored in a consistently biased way by particular raters. The results of the analysis indicated that certain raters consistently assessed the tape-version of the test more harshly whilst others consistently rated the live version more harshly. This type of approach also allowed a finer analysis at the level of individual items with respect to harshness and consistency across ratings. The implications for rater training and feedback are discussed.

References (10)

References

Brown, A. (1993) The effect of rater variables in the development of an occupation-specific language performance test. Paper presented at the annual Language Testing Research Colloquium, Cambridge, U.K., August 2-5

Elder, C. (1993) Are rater judgements of teacher effectiveness wholly language based. Paper presented at the annual Language Testing Research Colloquium, Cambridge, U.K., August 2-5

Fayer, J.M. and E. Krasinski (1987) Native and nonnative judgements of intelligibility and irritation. Language Learning 37,3:313–326

Linacre, J A M. (1992a) FACETS computer program for many faceted Rasch measurement (version 2.62). Chicago IL: Mesa Press

Linacre, J.M. (1992b) A User’s Guide to Facets. Chicago IL: Mesa Press

McNamara, T. (1990) Item Response Theory and the validation of an ESP test for health professionals, Language Testing 71:52–75

(in preparation) Second Language Performance Assessment. Unpublished manuscript.

Shohamy, E., C.M. Gordon and R. Kraemer (1992) The effect of rater’ background and training on the reliability of direct writing test. The Modern Language Journal 76,1:27–33

Stahl, J. and Lunz, M. (1992) Judge Performance Reports. Paper presented at AERA, San Francisco, April

Wigglesworth, G. (1993) Exploring bias analysis as a tool for improving rater consistency in assessing oral interaction. Language Testing, 10,3:305–335

Cited by (17)

Cited by 17 other publications

Order by:

ŞEVGİN, Hikmet & Mehmet ŞATA

2023. The Role of Time on Performance Assessment (Self, Peer and Teacher) in Higher Education: Rater Drift. Participatory Educational Research 10:5 ► pp. 98 ff.

Walsh, Matthew M., Burcu Arslan & Bridgid Finn

2021. Computational cognitive modeling of human calibration and validity response scoring for the Graduate Record Examinations (GRE).. Journal of Applied Research in Memory and Cognition 10:1 ► pp. 143 ff.

Finn, Bridgid, Burcu Arslan & Matthew Walsh

2020. Applying Cognitive Theory to the Human Essay Rating Process. Applied Measurement in Education 33:3 ► pp. 223 ff.

Finn, Bridgid, Cathy Wendler, Kathryn L. Ricker‐Pedley & Burcu Arslan

2018. Does the Time Between Scoring Sessions Impact Scoring Accuracy? An Evaluation of Constructed‐Response Essay Responses on theGRE® General Test. ETS Research Report Series 2018:1 ► pp. 1 ff.

Goh, Christine C. M. & Hui Teng Ang-Aw

2018. Teacher-Examiners’ Explicit and Enacted Beliefs About Proficiency Indicators in National Oral Assessments. In Teacher Involvement in High-Stakes Language Testing, ► pp. 197 ff.

Wesolowski, Brian C., Stefanie A. Wind & George Engelhard

2015. Rater fairness in music performance assessment: Evaluating model-data fit and differential rater functioning. Musicae Scientiae 19:2 ► pp. 147 ff.

Tajeddin, Zia & Minoo Alemi

2014. Criteria and Bias in Native English Teachers’ Assessment of L2 Pragmatic Appropriacy: Content and FACETS Analyses. The Asia-Pacific Education Researcher 23:3 ► pp. 425 ff.

Winke, Paula & Susan Gass

2013. The Influence of Second Language Experience and Accent Familiarity on Oral Proficiency Rating: A Qualitative Investigation. TESOL Quarterly 47:4 ► pp. 762 ff.

Winke, Paula

2012. Rating Oral Language. In The Encyclopedia of Applied Linguistics,

Winke, Paula

2014. Chapter 10. Formative, task-based oral assessments in an advanced Chinese-language class. In Technology-mediated TBLT [Task-Based Language Teaching, 6], ► pp. 263 ff.

Ang-Aw, Hui Teng & Christine Chuen Meng Goh

2011. Understanding Discrepancies in Rater Judgement on National-Level Oral Examination Tasks. RELC Journal 42:1 ► pp. 31 ff.

Brown, James Dean & Russell Changseob Ahn

2011. Variables that affect the dependability of L2 pragmatics tests. Journal of Pragmatics 43:1 ► pp. 198 ff.

Winke, Paula, Susan Gass & Carol Myford

2011. THE RELATIONSHIP BETWEEN RATERS' PRIOR LANGUAGE STUDY AND THE EVALUATION OF FOREIGN LANGUAGE SPEECH SAMPLES. ETS Research Report Series 2011:2

Winke, Paula, Susan Gass & Carol Myford

2013. Raters’ L2 background as a potential source of bias in rating oral performance. Language Testing 30:2 ► pp. 231 ff.

Schaefer, Edward

2008. Rater bias patterns in an EFL writing assessment. Language Testing 25:4 ► pp. 465 ff.

Kondo-Brown, Kimi

2002. A FACETS analysis of rater bias in measuring Japanese second language writing performance. Language Testing 19:1 ► pp. 3 ff.

Congdon, Peter J. & Joy MeQueen

2000. The Stability of Rater Severity in Large‐Scale Assessment Programs. Journal of Educational Measurement 37:2 ► pp. 163 ff.

This list is based on CrossRef data as of 14 november 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.