Graduation Year


Document Type




Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Educational Measurement and Research

Major Professor

Eunsook Kim, Ph.D.

Co-Major Professor

John Ferron, Ph.D.

Committee Member

Robert Dedrick, Ph.D.

Committee Member

Tony Tan, Ph.D.


multilevel modeling, measurement equivalence, simulation, nested data


While assessing emotions, behaviors or performance of preschoolers and young children, scores from adults such as parent psychiatrist and teacher ratings are used rather scores from children themselves. Data from parent ratings or from parents and teachers are often nested such as students are within teachers and a child is within their parents. This popular nested feature of data in educational, social and behavioral sciences makes measurement invariance (MI) testing across informants of children methodologically challenging. There was lack of studies that take into account the nested structure of data in MI testing for multiple adult informants, especially no simulation study that examines the performance of different models used to test MI across different raters.

This dissertation focused on two specific nesting data types in testing MI between adult raters of children: paired and partial nesting. For the paired data, the independence assumption of regular MI testing is often violated because the two informants (e.g., father and mother) rate the same child and their scores are anticipated to be related or dependent. The partial nesting data refers to the research situation where teacher and parent ratings are compared. In this scenario, it is common that each parent has only one child to rate while each teacher has multiple children in their classroom. Thus, in case of teacher and parent ratings of the same children, data are repeated measures and also partially nested. Because of these unique features of data, MI testing between adult informants of children requires statistical models that take into account different types of data dependency. I proposed and evaluated the performance of the two statistical models that can handle repeated measures and partial nesting with several simulated research scenarios in addition to one commonly used and one potentially appropriate statistical models across several research scenario. Results of the two simulation studies in this dissertation showed that for the paired data, both multiple-group confirmatory factor analysis (CFA) and repeated measure CFA models were able to detect scalar invariance most of the time using Δχ2 test and ΔCFI. Although the multiple-group CFA (Model 2) was able to detect scalar invariance better than the repeated measure CFA model (Model 1), the detection rates of Model 1 were still at the high level (88% - 91% using Δχ2 test and 84% - 100% using ΔCFI or ΔRMSEA). For configural invariance and metric invariance conditions for the paired data, Model 1 had higher detection rate than Model 2 in almost examined research scenario in this dissertation. Particularly while Model 1 could detect noninvariance (either in intercepts only or in both intercepts and factor loadings) than Model 2 for paired data most of the time, Model 2 could rarely catch it if using suggested cut-off of 0.01 for RMSEA differences. For the paired data, although both Models 1 and 2 could be a good choice to test measurement invariance, Model 1 might be favored if researchers are more interested in detecting noninvariance due to its overall high detection rates for all three levels (i.e. configural, metric, and scalar) of measurement invariance. For scalar invariance with partially nested data, both multilevel repeated measure CFA and design-based multilevel CFA could detect invariance most of the time (from 81% to 100% of examined cases) with slightly higher detection rate for the former model than the later. Multiple-group CFA model hardly detect scalar invariance except when ICC was small. The detection rates for configural invariance using Δχ2 test or Satorra-Bentler LRT were also highest for Model 3 (82% to 100% except only two conditions with detection rates of 61%), following by Model 5 and lowest Model 4. Models 4 and 5 could reach these rates only with the largest sample sizes (i.e., large number of cluster or large cluster size or large in both factors) when the magnitude of noninvariance was small. Unlike scalar and configural invariance, the ability to detect metric invariance was highest for Model 4, following by Model 5 and lowest for Model 3 across many conditions using all of the three performance criteria. As higher detection rates for all configural and scalar invariance, and moderate detection rates for many metric invariance conditions (except cases of small number of clusters combined with large ICC), Model 3 could be a good candidate to test measurement invariance with partially nested data when having sufficient number of clusters or if having small number of clusters with small ICC. Model 5 might be also a reasonable option for this type of data if both the number of clusters and cluster size were large (i.e., 80 and 20, respectively), or either one of these two factors was large coupled with small ICC. If ICC is not small, it is recommended to have a large number of clusters or combination of large number of clusters and large cluster size to ensure high detection rates of measurement invariance for partially nested data. As multiple group CFA had better and reasonable detection rates than the design-based and multilevel repeated measure CFA models cross configural, metric and scalar invariance with the conditions of small cluster size (10) and small ICC (0.13), researchers can consider using this model to test measurement invariance when they can only collect 10 participants within a cluster (e.g. students within a classroom) and there is small degree of data dependency (e.g. small variance between clusters) in the data.