Graduation Year


Document Type




Degree Name

Doctor of Public Health (Dr.PH.)

Degree Granting Department

Epidemiology and Biostatistics

Major Professor

Yangxin Huang, Ph.D.

Committee Member

Henian Chen, M.D., Ph.D.

Committee Member

Yiliang Zhu, Ph.D.

Committee Member

Getachew Dagne, Ph.D.

Committee Member

Julie Baldwin, Ph.D.


Two-part mixed-effects model, Substance abuse/dependence symptoms data, Bayesian analysis, Skewed distributions, Semi-continuous longitudinal data


Substance use data such as alcohol drinking often contain a high proportion of zeros. In studies examining the alcohol consumption in college students, for instance, many students may not drink in the studied period, resulting in a number of zeros. Zero-inflated continuous data, also called semi continuous data, typically consist of a mixture of a degenerate distribution at the origin (zero) and a right-skewed, continuous distribution for the positive values. Ignoring the extreme non-normality in semi-continuous data may lead to substantially biased estimates and inference. Longitudinal or repeated measures of semi-continuous data present special challenges in statistical inference because of the correlation tangled in the repeated measures on the same subject.

Linear mixed-eects models (LMM) with normality assumption that is routinely used to analyze correlated continuous outcomes are inapplicable for analyzing semi-continuous outcome. Data transformation such as log transformation is typically used to correct the non-normality in data. However, log-transformed data, after the addition of a small constant to handle zeros, may not successfully approximate the normal distribution due to the spike caused by the zeros in the original observations. In addition, the reasons that data transformation should be avoided include: (i) transforming usually provides reduced information on an underlying data generation mechanism; (ii) data transformation causes diculty in regard to interpretation of the transformed scale; and (iii) it may cause re-transformation bias. Two-part mixed-eects models with one component modeling the probability of being zero and one modeling the intensity of nonzero values have been developed over the last ten years to analyze the longitudinal semi-continuous data. However, log transformation is still needed for the right-skewed nonzero continuous values in the two-part modeling.

In this research, we developed Bayesian hierarchical models in which the extreme non-normality in the longitudinal semi-continuous data caused by the spike at zero and right skewness was accommodated using skew-elliptical (SE) distribution and all of the inferences were carried out through Bayesian approach via Markov chain Monte Carlo (MCMC). The substance abuse/dependence data, including alcohol abuse/dependence symptoms (AADS) data and marijuana abuse/dependence symptoms (MADS) data from a longitudinal observational study, were used to illustrate the proposed models and methods. This dissertation explored three topics:

First, we presented one-part LMM with skew-normal (SN) distribution under Bayesian framework and applied it to AADS data. The association between AADS and gene serotonin transporter polymorphism (5-HTTLPR) and baseline covariates was analyzed. The results from the proposed model were compared with those from LMMs with normal, Gamma and LN distributional assumptions. Simulation studies were conducted to evaluate the performance of the proposed models. We concluded that the LMM with SN distribution not only provides the best model t based on Deviance Information Criterion (DIC), but also offers more intuitive and convenient interpretation of results, because it models the original scale of response variable.

Second, we proposed a flexible two-part mixed-effects model with skew distributions including skew-t (ST) and SN distributions for the right-skewed nonzero values in Part II of model under a Bayesian framework. The proposed model is illustrated with the longitudinal AADS data and the results from models with ST, SN and normal distributions were compared under different random-effects structures. Simulation studies are conducted to evaluate the performance of the proposed models.

Third, multivariate (bivariate) correlated semi-continuous data are also commonly encountered in clinical research. For instance, the alcohol use and marijuana use may be observed in the same subject and there might be underlying common factors to cause the dependence of alcohol and marijuana uses. There is very limited literature on multivariate analysis of semi-continuous data. We proposed a Bayesian approach to analyze bivariate semi-continuous outcomes by jointly modeling a logistic mixed-effects model on zero-inflation in either response and a bivariate linear mixed-effects model (BLMM) on the positive values through a correlated random-effects structure. Multivariate skew distributions including ST and SN distributions were used to relax the normality assumption in BLMM. The proposed models were illustrated with an application to the longitudinal AADS and MADS data. A simulation study was conducted to evaluate the performance of the proposed models.

Included in

Biostatistics Commons