Graduation Year


Document Type




Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Mathematics and Statistics

Major Professor

Chris P. Tsokos, Ph.D.

Committee Member

Khandethody M. Ramachandran, Ph.D.

Committee Member

Lu Lu, Ph.D.

Committee Member

Getachew A. Dagne, Ph.D.


CPI, DIS, F8, Hemophilia, K - means, Inhibitor, Severity, Diabetes, Prediabetes, Forest, SVM


Parametric analysis of any real-world data is the most powerful tool to characterize the probabilistic behavior in social, economic, medical, epidemiological, and other areas of study. In the present study, we identify the theoretical Probability Distribution Function(PDF) for Democracy Index Scores (DIS) from the Economist Intelligence Unit (EIU) database and estimate the maximum likelihood estimates of the theoretical PDFS. We also identify the individual PDFs for each of the clusters, Full Democracy, Flawed Democracy, Hybrid Regime, and Authoritarian Regime defined by the Economist Intelligence Unit (EIU).

A statistical model is a convenient instrument to predict the future value of any real phenomenon. In addition to identifying probability distributions, we predict the DIS for 167 countries of the world through a regression model with a high degree of accuracy. Then we do cluster analysis through (K − means) clustering algorithm based on the DIS predicted by the corresponding statistical model we have developed.

By extracting Corruption Perception Index (CPI) and World Governance Index (WGI) from Transparency International (TI) and World Bank (WB) databases respectively, we estimate a theoretical PDF of CPI for 175 countries of the world. Moreover, we estimate individual PDFs for each of the clusters - Highly Corrupted, Moderately Corrupted, Fairly Corrupted, and Least Corrupted countries of the world.

We conducted statistical analyses on Hemophilia A based on the data retrieved from Centers for Disease Control and Prevention (CDC) CHAMP F 8 surveillance program to identify the risk factors involved in Severity level of Hemophilia A. We have identified a statistical model for probability prediction of the Severity level of Hemophilia A.

Finally, we study some standard machine learning algorithms to compare and identify the best algorithm to classify and predict the correct state of a prediabetes condition in individuals. For this present study, the data was extracted from the National Health and Nutrition Examination Surveys (NHANES), part of the Centers for Disease Control and Prevention (CDC). We compare the identified champion algorithm to the existing machine learning algorithms suggested by some researchers in other countries of the world.