Graduation Year


Document Type




Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Educational Measurement and Research

Major Professor

Yi-Hsin Chen, Ph.D.

Co-Major Professor

Jeffrey D. Kromrey, Ph.D.

Committee Member

John Ferron, Ph.D.

Committee Member

Stephen Stark, Ph.D.


differential item functioning, validity, item response modeling, Rasch models, the MIRID


Differential item functioning (DIF) is a psychometric issue routinely considered in educational and psychological assessment. However, it has not been studied in the context of a recently developed componential statistical model, the model with internal restrictions on item difficulty (MIRID; Butter, De Boeck, & Verhelst, 1998). Because the MIRID requires test questions measuring either single or multiple cognitive processes, it creates a complex environment for which traditional DIF methods may be inappropriate. This dissertation sought to extend the MIRID framework to detect DIF at the item-group level and the individual-item level. Such a model-based approach can increase the interpretability of DIF statistics by focusing on item characteristics as potential sources of DIF. In particular, group-level DIF may reveal comparative group strengths in certain secondary constructs. A simulation study was conducted to examine under different conditions parameter recovery, Type I error rates, and power of the proposed approach. Factors manipulated included sample size, magnitude of DIF, distributional characteristics of the groups, and the MIRID DIF models corresponding to discrete sources of differential functioning. The impact of studying DIF using wrong models was investigated.

The results from the recovery study of the MIRID DIF model indicate that the four delta (i.e., non-zero value DIF) parameters were underestimated whereas item locations of the four associated items were overestimated. Bias and RMSE were significantly greater when delta was larger; larger sample size reduced RMSE substantially while the effects from the impact factor were neither strong nor consistent. Hypothesiswise and adjusted experimentwise Type I error rates were controlled in smaller delta conditions but not in larger delta conditions as estimates of zero-value DIF parameters were significantly different from zero. Detection power of the DIF model was weak. Estimates of the delta parameters of the three group-level DIF models, the MIRID differential functioning in components (DFFc), the MIRID differential functioning in item families (DFFm), and the MIRID differential functioning in component weights (DFW), were acceptable in general. They had good hypothesiswise and adjusted experimentwise Type I error control across all conditions and overall achieved excellent detection power.

When fitting the proposed models to mismatched data, the false detection rates were mostly beyond the Bradley criterion because the zero-value DIF parameters in the mismatched model were not estimated adequately, especially in larger delta conditions. Recovery of item locations and component weights was also not adequate in larger delta conditions. Estimation of these parameters was more or less affected adversely by the DIF effect simulated in the mismatched data. To study DIF in MIRID data using the model-based approach, therefore, more research is necessary to determine the appropriate procedure or model to implement, especially for item-level differential functioning.