Graduation Year

2024

Document Type

Thesis

Degree

M.A.

Degree Name

Master of Arts (M.A.)

Degree Granting Department

Mathematics and Statistics

Major Professor

Jiwoong Kim, Ph.D.

Committee Member

Lu Lu, Ph.D.

Committee Member

Seung-Yeop Lee, Ph.D.

Keywords

AdaBoost and Bagging, breast cancer, decision tree, machine learning, naive Bayes, SVM, logistic regression

Abstract

Breast cancer represents a formidable malignancy, presenting a substantial threat to global health and individual well-being. Conventionally, it is widely held that the prognosis for breast cancer patients hinges predominantly upon the timing of diagnosis and the extent of cancer progression, typically delineated by its stage. However, emerging evidence from robust regression and machine learning analyses challenges this prevailing notion. The results indicate that survival months cannot be solely attributed to diagnosis and socio-economic factors. Instead, additional variables such as existing diseases and treatment complexities may contribute to the intricate landscape of breast cancer outcomes.

This research aims to delve into the factors that influence the survival status of breast cancer patients beyond the traditional understanding. By harnessing the power of advanced regression and machine learning techniques, this study explores the complex interplay of various variables that may impact a patient's survival status. The underlying premise of this research is rooted in addressing a critical inquiry that profoundly impacts cancer patients: “What factor influences survival outcome status?” Understanding the factors that shape survival status is of paramount importance to individuals battling breast cancer. The results of this study hold substantial relevance for clinical practice and patient care, particularly within machine learning methodologies. By acknowledging the complex nuances of survival outcomes, healthcare practitioners can embrace a comprehensive approach to treatment and patient oversight. The results highlight that survival status outcomes cannot be solely attributed to diagnosis and socio-economic factors but necessitate a thorough assessment of individualized factors. Coexisting diseases and treatment intricacies may significantly influence the prognosis, diagnosis, and overall survival of the patients.

Using survival status as a central hypothesis reflects cancer patients' pressing concerns and uncertainties. By comprehensively exploring the factors that underpin survival months, this research aims to provide valuable insights that empower healthcare providers to deliver personalized care and support. Recognizing the complexity of breast cancer outcomes will enable clinicians to tailor treatment plans, consider individualized variables, and address the unique needs and concerns of patients. Ultimately, this study endeavors to enhance the understanding of breast cancer prognosis, optimize patient outcomes, and improve the quality of care for patients navigating the challenging journey of breast cancer.

This study presents an analysis of breast cancer data employing various machine learning algorithms to predict survival status. The dataset encompasses information crucial for diagnosis, including whether cancer is present or absent, classified as malignant or benign. Notably, survival status, typically associated with post-treatment prognosis, may not directly align with the context of the initial diagnosis. Longitudinal studies are essential for understanding survival outcomes dynamically, capturing the effects of treatment changes, disease progression, and time-dependent factors. By analyzing individual patient trajectories, previously undetectable patterns emerge, offering insights into breast cancer evolution and its impact on survival.

This study evaluated various classification algorithms, including logistic regression, decision trees, naive Bayes, support vector machine (SVM), AdaBoost, and bagging, to predict survival status. While logistic regression remains a conventional statistical tool, its limitations in capturing complex relationships in survival prediction were evident. In contrast, machine learning algorithms demonstrated advantages, particularly in handling nonlinear relationships and imbalanced datasets. Advanced algorithms, such as bagging, performed better in accurately predicting breast cancer survival status, surpassing logistic regression and other methods. This reinforces the critical role of advanced machine learning techniques in enhancing the accuracy and reliability of breast cancer survival prediction models.

Share

COinS