Graduation Year
2021
Document Type
Dissertation
Degree
Ph.D.
Degree Name
Doctor of Philosophy (Ph.D.)
Degree Granting Department
Public Health
Major Professor
Henian Chen, M.D., Ph.D.
Co-Major Professor
Wei Wang, Ph.D.
Committee Member
Getachew Dagne, Ph.D.
Committee Member
Ellen Daley, Ph.D.
Keywords
confidentiality, data privacy, generalized linear models, hypothesis testing, statistical disclosure limitation
Abstract
Background: There is a need for rigorous and standardized methods of privacy protection for shared data in the health sciences. Differential privacy is one such method that has gained much popularity due to its versatility and robustness. This study evaluates differential privacy for explanatory regression modeling in the context of health research.
Methods: Surveyed and newly proposed algorithms were evaluated with respect to the accuracy (bias and RMSE) of coefficient estimates, the empirical coverage probability of confidence intervals, and the power and type I error rates of hypothesis tests. Evaluations took place in both simulated and real data from a study of adolescent behavioral health.
Results: For coefficient estimation, the simulation found the objective and output perturbation algorthms to be the most accurate for logistic models, and subsample-and-aggregate emerged as the most accurate for linear and log-linear models. However, only objective and output perturbation had sufficiently low noise at reasonable settings of the privacy parameter epsilon. The empirical coverage probability of confidence intervals only neared the nominal 95% rate for the ouput perturbation algorithm, at less private settings of epsilon. Of the available algorithms for hypothesis testing, only the Noisy Aggregated Censored z-test maintained an appropriate type I error rate, though power was only satisfactory at the least private settings of epsilon.
Conclusions: The objective and output perturbation algorithms emerged as the most promising for differentially private regression statistics. Further work is needed to derive corresponding algorithms for statistical inference.
Scholar Commons Citation
Ficek, Joseph, "Differential Privacy for Regression Modeling in Health: An Evaluation of Algorithms" (2021). USF Tampa Graduate Theses and Dissertations.
https://digitalcommons.usf.edu/etd/9674