Mohammad Syaifuddin, Mifedwil Jandra Janan


Abstract. The purposes of this research are toinvestigate (1) the best method of test equating under the graded responsemodel (GRM) for multiple categories, (2) the minimum number of anchor itemsrequired for test equating, and (3) the effect of ability difference onmathematics test equating. The research was conducted using common item non-equivalent groups design andsimulation data. The data of the simulation study were generated for a 20-item,and each item test has three to five ordered categories. The abilitydistribution of the base group was generated as a normal standard and twodifferent targets: N(0,1) and N(1,1). Data were generated for three sample sizecombinations that were for K1-300/K2-300, K1-500/K2-500, and K1-1000/K2-1000.Fifty replications were made for each of the three sample sizes. The PARSCALE 4.1for Windows was used to estimate the item parameters of the base and targetgroups, and WINGEN and IRTEQT were used to generate data and to calculate theequating coefficients of A and B for Stocking & Lord, Mean & Sigma-average,Mean & Sigma, and ANOVA in order to test the hypotheses. The RMSD was usedto measure quality of the test equating. The result of this research shows us four mainfindings. Firstly, Stocking& Lord (SL) method is relativelysimilar to the Mean & Sigma-average (MSa) method, and both are slightlybetter than Mean & Sigma (MS) method in equal ability distributiongroups. In non equal ability distribution groups, SLmethod is more accurate than MS and MSa method, and MS method more accuratethan MSa method. Quality of test equating by using three methods on equal abilitydistribution groups is relatively similar for large sample size. Secondly, the higher the number of anchoritems, the better the test equating is, except on the MSa method and non equalability distribution groups. Thirdly, the test equating from the non equal abilitydistribution groups is more accurate than the equal ability distribution groups.Lastly, the larger sample size, the better quality of test equating is. 


Test equating, graded response model, multiple categories


