Degree Granting Department
Computer Science and Engineering
Lawrence O. Hall, Ph.D.
Dmitry Goldgof, Ph.D.
Sudeep Sarkar, Ph.D.
Classifiers, CT-scan, Feature Selection, Image Features, Radiomics, Support Vector Machine
A CT-scan of lungs has become ubiquitous as a thoracic diagnostic tool. Thus, using CT-scan images in developing predictive models for tumor types and survival time of patients afflicted with Non-Small Cell Lung Cancer (NSCLC) would provide a novel approach to non-invasive tumor analysis. It can provide an alternative to histopathological techniques such as needle biopsy. Two major tumor analysis problems were addressed in course of this study, tumor type classification and survival time prediction. CT-scan images of 109 patients with NSCLC were used in this study. The first involved classifying tumor types into two major classes of non-small cell lung tumors, Adenocarcinoma and Squamous-cell Carcinoma, each constituting 30% of all lung tumors. In a first of its kind investigation, a large group of 2D and 3D image features, which were hypothesized to be useful, are evaluated for effectiveness in classifying the tumors. Classifiers including decision trees and support vector machines (SVM) were used along with feature selection techniques (wrappers and relief-F) to build models for tumor classification. Results show that over the large feature space for both 2D and 3D features it is possible to predict tumor classes with over 63% accuracy, showing new features may be of help. The accuracy achieved using 2D and 3D features is similar, with 3D easier to use. The tumor classification study was then extended by introducing the Bronchioalveolar Carcinoma (BAC) tumor type. Following up on the hypothesis that Bronchioalveolar Carcinoma is substantially different from other NSCLC tumor types, a two-class problem was created, where an attempt was made to differentiate BAC from the other two tumor types. To make a three-class problem a two-class problem, misclassification amongst Adenocarcinoma and Squamous-cell Carcinoma were ignored. Using the same prediction models as the previous study and just 3D image features, tumor classes were predicted with around 77% accuracy. The final study involved predicting two year survival time in patients suffering from NSCLC. Using a subset of the image features and a handful of clinical features, predictive models were developed to predict two year survival time in 95 NSCLC patients. A support vector machine classifier, naive Bayes classifier and decision tree classifier were used to develop the predictive models. Using the Area Under the Curve (AUC) as a performance metric, different models were developed and analyzed for their effectiveness in predicting survival time. A novel feature selection method to group features based on a correlation measure has been proposed in this work along with feature space reduction using principal component analysis. The parameters for the support vector machine were tuned using grid search. A model based on a combination of image and clinical features, achieved the best performance with an AUC of 0.69, using dimensionality reduction by means of principal component analysis along with grid search to tune the parameters of the SVM classifier. The study showed the effectiveness of a predominantly image feature space in predicting survival time. A comparison of the performance of the models from different classifiers also indicate SVMs consistently outperformed or matched the other two classifiers for this data.
Scholar Commons Citation
Basu, Satrajit, "Developing Predictive Models for Lung Tumor Analysis" (2012). USF Tampa Graduate Theses and Dissertations.