Graduation Year


Document Type




Degree Name

Master of Arts (M.A.)

Degree Granting Department

Mathematics and Statistics

Major Professor

Kandethody M. Ramachandran, Ph.D.

Committee Member

Feng Cheng, Ph.D.

Committee Member

Chris P. Tsokos, Ph.D.


Association Study, Big Data, Pharmacovigilance, Data Mining, Statistical Algorithms


One of the objectives of the U.S. Food and Drug Administration is to protect the public health through post-marketing drug safety surveillance, also known as Pharmacovigilance. An inexpensive and efficient method to inspect post-marketing drug safety is to use data mining algorithms on electronic health records to discover associations between drugs and adverse events.

The purpose of this study is two-fold. First, we review the methods and algorithms proposed in the literature for identifying association drug interactions to an adverse event and discuss their advantages and drawbacks. Second, we attempt to adapt some novel methods that have been used in comparable problems such as the genome-wide association studies and the market-basket problems. Most of the common methods in the drug-adverse event problem have univariate structure and thus are vulnerable to give false positive when certain drugs are usually co-prescribed. Therefore, we will study applicability of multivariate methods in the literature such as Logistic Regression and Regression-adjusted Gamma-Poisson Shrinkage Model for the association studies. We also adopted Random Forest and Monte Carlo Logic Regression from the genome-wide association study to our problem because of their ability to detect inherent interactions. We have built a computer program for the Regression-adjusted Gamma Poisson Shrinkage model, which was proposed by DuMouchel in 2013 but has not been made available in any public software package. A comparison study between popular methods and the proposed new methods is presented in this study.