Graduation Year

2012

Document Type

Dissertation

Degree

Ph.D.

Degree Granting Department

Psychology

Major Professor

Stephen Stark, Ph.D.

Committee Member

Michael T. Brannick, Ph.D.

Committee Member

Michael D. Coovert, Ph.D.

Committee Member

Bill N. Kinder, Ph.D.

Committee Member

Yi-Hsin Chen, Ph.D.

Keywords

crossing simultaneous item bias test, DIF, differential item functioning, item bias, item response theory likelihood ratio test, logistic regression

Abstract

The purpose of this investigation was to compare the efficacy of three methods for detecting differential item functioning (DIF). The performance of the crossing simultaneous item bias test (CSIBTEST), the item response theory likelihood ratio test (IRT-LR), and logistic regression (LOGREG) was examined across a range of experimental conditions including different test lengths, sample sizes, DIF and differential test functioning (DTF) magnitudes, and mean differences in the underlying trait distributions of comparison groups, herein referred to as the reference and focal groups. In addition, each procedure was implemented using both an all-other anchor approach, in which the IRT-LR baseline model, CSIBEST matching subtest, and LOGREG trait estimate were based on all test items except for the one under study, and a constant anchor approach, in which the baseline model, matching subtest, and trait estimate were based on a predefined subset of DIF-free items. Response data for the reference and focal groups were generated using known item parameters based on the three-parameter logistic item response theory model (3-PLM). Various types of DIF were simulated by shifting the generating item parameters of select items to achieve desired DIF and DTF magnitudes based on the area between the groups' item response functions. Power, Type I error, and Type III error rates were computed for each experimental condition based on 100 replications and effects analyzed via ANOVA. Results indicated that the procedures varied in efficacy, with LOGREG when implemented using an all-other approach providing the best balance of power and Type I error rate. However, none of the procedures were effective at identifying the type of DIF that was simulated.

Share

COinS