Research Article
A Comparison of Score Equating Conducted Using Haebara and Stocking Lord Method for Polytomous

Risky Setiawan

APA 7th edition
Setiawan, R. (2019). A Comparison Of Score Equating Conducted Using Haebara And Stocking Lord Method For Polytomous. European Journal of Educational Research, 8(4), 1071-1079.

Setiawan R. 2019 'A Comparison Of Score Equating Conducted Using Haebara And Stocking Lord Method For Polytomous', European Journal of Educational Research, 8(4), pp. 1071-1079.
Chicago 16th edition
Setiawan Risky. "A Comparison Of Score Equating Conducted Using Haebara And Stocking Lord Method For Polytomous," European Journal of Educational Research 8, no. 4 (2019): 1071-1079.


The purposes of this research are: 1) to compare two equalizing tests conducted with Hebara and Stocking Lord method; 2) to describe the characteristics of each equalizing test method using windows’ IRTEQ program. This research employs a participatory approach as the data are collected through questionnaires based on the National Examination Administration of 2018. The samples are classified into group A and group B respectively by 449 and 502 respondents. This paper discusses how to equalize shared items using the anchor method with a set of instruments in the forms of 35 questionnaire items and 6 shared items. In addition, the researcher also uses PARSCALE to estimate each respondent’s skills and each item’s characteristics. The shared items are eventually equalized using IRTEQ program. The results show that there is a significant difference between those conducted using Haebara method (0.592) which produces bigger mean-sigma value and Stocking & Lord (0.00213). Thus, the results show that the shared testing items may improve respondents’ discrimination and increase the difficulty level (parameter b). Due to the availability of shared items, it is good and appropriate to equalize two different tests on different theta skills.

Keywords: Equating, polytomous, graded data.


Angoff, W. H. (1971). Scales, norms and equivalent scores. Educational measurements. Washington, DC: American Council on Education.

Antara, A., & Bastari, B. (2015). Vertical equalization with classical and item response theories in elementary school students. Journal of Educational Research and Evaluation, 19(1), 13-24.

Brennan, R. L., & Kolen, M. J. (2004), Test equating, scaling, and linking. Iowa City, IO: American Council on Education and Springer Publisher.

Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage Publications.

Dafflon, B. (2011). Solidarity and the design of equalization: Setting out the issues. eJournal of Tax Research, 10(1), 1-26.

Yu, C. H., & Osborn Popp, S. E. (2005). Test equating by common items and common subjects: Concepts and applications. Practical Assessment Research & Evaluation, 10(4), 1-19.

Crocker, L. M., & Algina, J. (1986). Introduction to classical and modern test theory. New York, NY: Holt, Rinehart, and Winston.

Gronlund, N. E., & Linn, R. .L (1990). Measurement and evaluation in teaching (6th ed). New York, NY: Macmillan/ London, UK: Collier Macmillan.

Hambleton, R. K., Swaminathan, H., & Rogers, H.J. (1991). Fundamental of item response theory. Newbury Park, CA: Sage.

Hambleton, R.K., & Swaminathan, H. (1985). Item response theory principles and applications. Boston, MA: Kluwer.

Kolen, M. J., & Brenan, R. L. (1995). Test equating: Method and practices. New York, NY: Springer Verlag.

Kolen, M. J., & Brenan, R. L. (2004). Test equating, scaling, and linking. Iowa, IO: Springer.

Kolen, M. J., & Brennan, R. L. (1995). Test equating. New York, NY: Springer Verlag.

Kumaidi. (2000). Standardization of problem items. Journal of Education and Culture, 5, 132-143.

Livingstone, S. A., Doran, N. J., & Wright, N. K. (1990). What Combination of Sampling and Equating Methods Work Best? Applied Measurement in Education, 3, 73-95.

Lord, F. M. (1990). Aplications of item response theory to practical testing problems. Mahwah, NJ: Lawrence Erlbaum Associates.

Miyatun, E., & Mardapi D. (2000). Comparison of test equalization methods according to item response theory. Jurnal Penelitian dan Evaluasi, II(3), 124-132.

Naga, D. S. (1992). Introduction to scoring theory on educational measurement. Jakarta, Indonesia: Besbats.

Peterson, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming, and equating. In R.L. Linn (Ed), Educational measurement. New York, NY: Macmillan.

Rahayu, W. (2008). The effect of the linking method on many false positive items on dif detection based on item responsiveness theory. Dissertation. Jakarta, Indonesia:  Universitas Negeri Jakarta.

Rahmawati, R., & Mardapi, D. (2015). Modified Robust Z method for equating and detecting item parameter drift. REiD (Research and Evaluation in Education), 1(1), 100-113.

Setiadi, H. (1998). Question bank calibrated with the IRT concept solve problems of systematic exams held in specified periods. Jurnal Kajian Dikbud, IV, 13.

Sukirno, D. S. (2007). National test equalization: Why and how? Jurnal Cakrawala Pendidikan, XXVI(3), 305-321.

Susongko, P. (2005, May). Matching item parameters concurrently to test statistically the existence of item function (DIF). Paper presented at the National Seminar on Research Results on Evaluation of Learning Outcomes and Management, Yogyakarta, Indonesia.

Tumilisar, A. V. J. (2006). Relative accuracy of equalization tests for 300-size samples judging from the equalization Method and Refining Technique. Jurnal Pendidikan Penabur, 6, 1-19.

Weimo, Z. (1998). Test equating: What, why, how? Research Quarterly for Exercise and Sport, 69(1), 11-23.