Doctor of Philosophy (Ph.D.)
Degree Granting Department
Mathematics and Statistics
Chris P. Tsokos, Ph.D.
Kandethody M. Ramachandran, Ph.D.
Lu Lu, Ph.D.
Yicheng Tu, Ph.D.
Healthcare Business Segment (HBS), Pancreatic Adenocarcinoma, Parametric Survival Models, Stochastic Modeling in Finance, Subjective Well Being (SWB), Survival Monitoring Indicator (SMI)
Pancreatic cancer is one of the most deathly disease and becoming an increasingly commoncause of cancer mortality. It continues giving rise to massive challenges to clinicians and cancer researchers. The combined five-year survival rate for pancreatic cancer is extremely low, about 5 to 10 percent, owing to the fact that a large number of the patients are diagnosed at stage IV when the disease has metastasized. Our study investigates if there exists any statistical significant difference between the median survival times and also the survival probabilities of male and female pancreatic cancer patients at different cancer stages, and irrespective of stages. Also, we investigated if there exists any parametric probability distribution function that best fits the male and female patient survival times in different stages of cancer , irrespective of stages , and performed the parametric survival analysis by using SEER cancer database. We also have developed a data-driven survival model to predict the survival times of individual pancreatic patients using extreme gradient boosting, which was done based the NIH PLCO (Prostate, Lung, Colorectal and Ovarian ) cancer data. Most importantly, we have identified ten risk factors that contribute significantly to the survival of the patient diagnosed with pancreatic adenocarcinoma. Once we identify these risk factors, we rank them with respect to the percentage of contribution to pancreatic cancer. For example, the top three most contributing risk factors of pancreatic adenocarcinoma are the age of the patient (35.5 %), current body mass index (BMI) (24.3 %), and the number of years smoking cigarette (14.93 %). The proposed predictive analytical model is 96.42% accurate. This model has been statistically tested to give excellent predictions. We have developed a stochastic model that is a function of Stochastic growth intensity factor (SGIF) and a Survival Index SI, that we have introduced. The SI identifies the survival rate of pancreatic cancer patients as a function of time, and SGIF monitors the behavior of pancreatic cancer patients at a specific time. The SI is an important decision-making indicator that conveys three important conditions of the pancreatic cancer patients at a specific time.
- The patients’ survival time is increasing.
- The patients’ survival remains the same.
- The patients’ survival time is decreasing.
The SI offers a number of important uses on the subject matter. For example, in the case of pancreatic cancer patients, they have three different treatments.
- Chemotherapy only (C)
- Radiation only (R)
- Chemotherapy and Radiation both (C+R)
The proposed SI can be used to evaluate the effectiveness of the administered treatment to a given patient. That is, if the treatment worsens the patient’s cancer, the treatment has no effect on cancer, or the treatment is effective on the cancer. To our knowledge, there is no such analytical model that offers this important evaluation of different treatments. The flexibility of our model lies in the fact that it can incorporate any number of additional treatments. Furthermore, our study categories pancreatic cancer patients from three race groups, Caucasian, African American, and other in utilizing the proposed analytical model. In addition, our analysis is performed at four different stages of pancreatic cancer and three different age groups, 40 to 59, 60 to 79, and 80 and older. Our statistical analysis includes some other important findings. For example, are there any significant differences in the survival rate between male and female pancreatic cancer patients? We have also found that the Generalized Pareto probability distribution function best characterizes the survival times of pancreatic cancer patients. This finding is important in obtaining a more powerful measurement/estimation of the survival analysis of the subject patients. That is, it gives more accurate results than the classical methods that are commonly used. We also built predictive models for healthcare business segment (HBS) by utilizing the S&P 500 stock data. We identified the most significant financial and economic indicators, along with the significant interactions, that affect the stock return of the segment by ranking those. We identified the optimum levels of the financial indicators for which the stock price is maximized via analytical modeling. Finally, we developed an analytical procedure that can monitor and predict the Average Weekly Percentage Return (AWPR) of the HBS. We also developed a data-driven analytical predictive model to predict the subjective well being (SWB)/happiness score by utilizing the world happiness data. The developed analytical model predicts the happiness of an individual based on certain socio-economic factors. After building the model, we ranked the attributable factors, and significant interacting effects according to the percentage of contribution of the happiness score. Finally, we have implemented clustering algorithm to categorize individual countries of the world in three different clusters based on their predicted happiness score. We have compared the happiness scores for different clusters and have done some exploratory data analysis to understand which indicators contribute the most to each cluster. Finally, we validated our clustering mechanism based on three popular machine learning classification algorithms and obtained excellent accuracy.
Scholar Commons Citation
Chakraborty, Aditya, "Data-Driven Analytical Predictive Modeling for Pancreatic Cancer, Financial & Social Systems" (2022). USF Tampa Graduate Theses and Dissertations.