Graduation Year

2021

Document Type

Dissertation

Degree

Ph.D.

Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Public Health

Major Professor

Henian Chen, M.D., Ph.D.

Co-Major Professor

Wei Wang, Ph.D.

Committee Member

Getachew Dagne, Ph.D.

Committee Member

Ellen Daley, Ph.D.

Keywords

confidentiality, data privacy, generalized linear models, hypothesis testing, statistical disclosure limitation

Abstract

Background: There is a need for rigorous and standardized methods of privacy protection for shared data in the health sciences. Differential privacy is one such method that has gained much popularity due to its versatility and robustness. This study evaluates differential privacy for explanatory regression modeling in the context of health research.

Methods: Surveyed and newly proposed algorithms were evaluated with respect to the accuracy (bias and RMSE) of coefficient estimates, the empirical coverage probability of confidence intervals, and the power and type I error rates of hypothesis tests. Evaluations took place in both simulated and real data from a study of adolescent behavioral health.

Results: For coefficient estimation, the simulation found the objective and output perturbation algorthms to be the most accurate for logistic models, and subsample-and-aggregate emerged as the most accurate for linear and log-linear models. However, only objective and output perturbation had sufficiently low noise at reasonable settings of the privacy parameter epsilon. The empirical coverage probability of confidence intervals only neared the nominal 95% rate for the ouput perturbation algorithm, at less private settings of epsilon. Of the available algorithms for hypothesis testing, only the Noisy Aggregated Censored z-test maintained an appropriate type I error rate, though power was only satisfactory at the least private settings of epsilon.

Conclusions: The objective and output perturbation algorithms emerged as the most promising for differentially private regression statistics. Further work is needed to derive corresponding algorithms for statistical inference.

Scholar Commons Citation

Ficek, Joseph, "Differential Privacy for Regression Modeling in Health: An Evaluation of Algorithms" (2021). USF Tampa Graduate Theses and Dissertations.
https://digitalcommons.usf.edu/etd/9674

Download

Included in

Biostatistics Commons

COinS

USF Tampa Graduate Theses and Dissertations

Differential Privacy for Regression Modeling in Health: An Evaluation of Algorithms

Graduation Year

Document Type

Degree

Degree Name

Degree Granting Department

Major Professor

Co-Major Professor

Committee Member

Committee Member

Keywords

Abstract

Scholar Commons Citation

Included in

Search

Browse By

Useful Links

USF Tampa Graduate Theses and Dissertations

Differential Privacy for Regression Modeling in Health: An Evaluation of Algorithms

Author

Graduation Year

Document Type

Degree

Degree Name

Degree Granting Department

Major Professor

Co-Major Professor

Committee Member

Committee Member

Keywords

Abstract

Scholar Commons Citation

Included in

Share

Search

Browse By

Useful Links