References (47)
References
Bock, R. D., & Zimowski, M. F. (1997). Multiple Group IRT. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 433–448). Springer. Google Scholar logo with link to Google Scholar
Braun, H. I. (1988). A new approach to avoiding problems of scale in interpreting trends in mental measurement data. Journal of Educational Measurement, 25(3), 171–191. Google Scholar logo with link to Google Scholar
Briggs, D. C., & Domingue, B. (2013). The gains from vertical scaling. Journal of Educational and Behavioral Statistics, 38(6), 551–576. Google Scholar logo with link to Google Scholar
Briggs, D. C., & Weeks J. P. (2009). The impact of vertical scaling decisions on growth interpretations. Educational Measurement: Issues and Practice, 28(4), 3–14. Google Scholar logo with link to Google Scholar
Carlson, J. E. (2010). Statistical models for vertical linking. In A. A. von Davier (Ed.), Statistical models for test equating, scaling, and linking (pp. 59–70). Springer.Google Scholar logo with link to Google Scholar
Crocker, L., & Algina, J. (1986). Introduction to modern and classical test theory. Holt, Rinehart, and Winston.Google Scholar logo with link to Google Scholar
Deng, W., & Monfils, L. (2017). Long-term impact of valid case criterion on capturing population-level growth under item response theory equating (ETS Research Report Series No. RR–17–17). ETS. Google Scholar logo with link to Google Scholar
Haberman, S. J. (2012). A general program for item-response analysis that employs the stabilized Newton-Raphson algorithm (Unpublished manuscript). ETS.Google Scholar logo with link to Google Scholar
Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144–149. Google Scholar logo with link to Google Scholar
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage.Google Scholar logo with link to Google Scholar
Hanson, B. A., & Beguin, A. A. (1999). Separate versus concurrent estimation of IRT item parameters in the common item equating design (ACT Research Report Series, 99–8). ACT.Google Scholar logo with link to Google Scholar
(2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design. Applied Psychological Measurement, 26(1), 3–24. Google Scholar logo with link to Google Scholar
Harris, D. J. (1991). A comparison of Angoff’s Design I and Design II for vertical equating using traditional and IRT methodology. Journal of Educational Measurement, 28(3), 221–235. Google Scholar logo with link to Google Scholar
(2007). Practical issues in vertical scaling. In N. J. Dorans, M. Pommerich, & P. W. Holland (Eds.), Linking and aligning scores and scales (pp. 233–251). Springer. Google Scholar logo with link to Google Scholar
Harris, D. J., & Hoover, H. D. (1987). An application of the three-parameter IRT model to vertical equating. Applied Psychological Measurement, 11(2), 151–159. Google Scholar logo with link to Google Scholar
Holland, P. W. (2007). A framework and history for score linking. In N. J. Dorans, M. Pommerich, & P. W. Holland (Eds.), Linking and aligning scores and scales (pp. 5–30). Springer. Google Scholar logo with link to Google Scholar
Hoskens, M., Lewis, D. M., & Patz, R. J. (2003). Maintaining vertical scalings using a common item design. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL.Google Scholar logo with link to Google Scholar
Ito, K., Sykes, R. C., & Yao, L. (2008). Concurrent and separate grade-groups linking procedures for vertical scaling. Applied Measurement in Education, 21(3), 187–206. Google Scholar logo with link to Google Scholar
Kenyon, D. M., MacGregor, D., Li, D., & Cook, H. G. (2011). Issues in vertical scaling of a K–12 English language proficiency test. Language Testing, 28(3), 383–400. Google Scholar logo with link to Google Scholar
Kim, S.-H., & Cohen, A. S. (1998). A comparison of linking and concurrent calibration under item response theory. Applied Psychological Measurement, 22(2), 131–143. Google Scholar logo with link to Google Scholar
Kolen, M. J. (1981). Comparison of traditional and item response theory methods of equating tests. Journal of Educational Measurement, 18(1), 1–11. Google Scholar logo with link to Google Scholar
(2006). Scaling and norming. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 156–186). American Council on Education; Praeger.Google Scholar logo with link to Google Scholar
(2011). Issues associated with vertical scales for PARCC assessments. Retrieved on 6 February 2023 from [URL]Google Scholar logo with link to Google Scholar
Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking: Methods and practices (3rd ed.). Springer. Google Scholar logo with link to Google Scholar
Linn, R. L. (1993). Linking results of distinct assessments. Applied Measurement in Education, 6(1), 83–102. Google Scholar logo with link to Google Scholar
Lord, F. M. (1975). The ‘ability’ scale in item characteristic curve theory. Psychometrika, 40(2), 205–217. Google Scholar logo with link to Google Scholar
Martineau, J. A. (2006). Distorting value added: The use of longitudinal, vertically scaled student achievement data for growth-based, value-added accountability. Journal of Educational and Behavioral Statistics, 31(1), 35–62. Google Scholar logo with link to Google Scholar
Masters, G. N., & Wright, B. D. (1997). The partial credit model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 101–122). Springer. Google Scholar logo with link to Google Scholar
McNamara, T. F. (1996). Measuring second language performance. Longman.Google Scholar logo with link to Google Scholar
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159–176. Google Scholar logo with link to Google Scholar
Patz, R. J., & Yao, L. (2007). Methods and models for vertical scaling. In N. J. Dorans, M. Pommerich, & P. W. Holland (Eds.), Linking and aligning scores and scales (pp. 253–272). Springer. Google Scholar logo with link to Google Scholar
Peterson, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming, and equating. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 221–262). Macmillan.Google Scholar logo with link to Google Scholar
Reckase, M. D. (2009). Multidimensional item response theory. Springer. Google Scholar logo with link to Google Scholar
(2010). Study of best practices for vertical scaling and standard setting with recommendations for FCAT 2.0. [URL]Google Scholar logo with link to Google Scholar
Skaggs, G., & Lissitz, R. W. (1986). IRT test equating: Relevant issues and a review of recent research. Review of Educational Research, 56(4), 495–529. Google Scholar logo with link to Google Scholar
(1988). Effect of examinee ability on test equating invariance. Applied Psychological Measurement, 12(1), 69–82. Google Scholar logo with link to Google Scholar
Slinde, J. A., & Linn, R. L. (1979). A note on vertical equating via the Rasch model for groups of quite different ability and tests of quite different difficulty. Journal of Educational Measurement, 16, 159–165. Google Scholar logo with link to Google Scholar
Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7(2), 201–210. Google Scholar logo with link to Google Scholar
Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67–113). Lawrence Erlbaum Associates.Google Scholar logo with link to Google Scholar
Tomkowicz, J., Zhang, L., & Yen, S. (2010). Comparison of vertical scaling maintenance methods and their impact on scale properties. Paper presented at the annual meeting of the National Council on Measurement in Education, Denver, CO.Google Scholar logo with link to Google Scholar
Tong, Y., & Kolen, M. J. (2010). Scaling: An ITEMS module. Educational Measurement: Issues and Practice, 29(4), 39–48. Google Scholar logo with link to Google Scholar
von Davier, M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61(2), 287–307. Google Scholar logo with link to Google Scholar
Wu, R. Y., & Liao, C. H. (2010). Establishing a common score scale for the GEPT Elementary, Intermediate, and High-Intermediate Level listening and reading tests. In T. Kao & Y. Li (Eds.), A new look at language teaching and testing: English as subject and vehicle – Selected papers from the 2009 LTTC International Conference on English Language Teaching and Testing (pp. 309–329). Language Training and Testing Center.Google Scholar logo with link to Google Scholar
Yen, W. M. (1986). The choice of scale for educational measurement: An IRT perspective. Journal of Educational Measurement, 23(4), 299–325. Google Scholar logo with link to Google Scholar
(2007). Vertical scaling and No Child Left Behind. In N. J. Dorans, M. Pommerich, & P. W. Holland (Eds.), Linking and aligning scores and scales (pp. 273–283). Springer. Google Scholar logo with link to Google Scholar
Yen, W. M., & Fitzpatrick, A. (2006). Item response theory. In R. L. Brennan (Ed.), Educational measurement (4th Ed.) (pp. 111–153). American Council on Education, Praeger.Google Scholar logo with link to Google Scholar
Young, M. J. (2006). Vertical scales. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (pp. 469–485). Lawrence Erlbaum Associates.Google Scholar logo with link to Google Scholar
Mobile Menu Logo with link to supplementary files background Layer 1 prag Twitter_Logo_Blue