Graduation Year

2023

Document Type

Dissertation

Degree

Ph.D.

Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Public Health

Major Professor

Janice Zgibor, RPh, Ph.D.

Co-Major Professor

Jason Beckstead, Ph.D.

Committee Member

Wei Wang, Ph.D.

Committee Member

Henian Chen, MD, Ph.D.

Keywords

Clustering Imputation, K-Means, Latent Class Imputation, Missing Data, Multiple Imputation, K-Modes, Non-Parametric Impucation

Abstract

Research has a variety of difficulties, especially when involving human subjects, and one of the most prevalent is the issue of missing data. Missing data will always be present in research due to the fact there is no perfect method for collecting data and protecting against human error or mechanical failure. This requires researchers to be able to mitigate the problems that come along with missing data; reduction in power of an analysis and bias introduced by the missing pattern. This research investigated a non-parametric method using a nested approach of fuzzy K-Modes and fuzzy C-Means clustering to impute missing data in an effort to reduce the issues introduced by the most severe type of missing data, missing not at random. This method was compared to complete case analysis and Latent Class Imputation. The results of the simulation showed that the proposed method did not sufficiently remove the bias imposed by the missing not at random pattern and could not successfully detect statistically significant regression coefficients. The method showed better results working with continuous variables and proved to be a more efficient estimation method than Latent Class Imputation, by having lower standard errors of the estimates in every scenario. Latent Class Imputation when the priors are not correctly specified also failed to sufficiently mitigate the issues of the missing not at random pattern. The proposed method being more efficient than Latent Class Imputation and requiring no outside assistance beyond refinement of the algorithm holds promise that with further tuning will successfully mitigate the problems of data that is missing not at random.

Included in

Biostatistics Commons

Share

COinS