Graduation Year


Document Type




Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Computer Science and Engineering

Major Professor

Lawrence Hall, Ph.D.

Co-Major Professor

Dmitry Goldgof, Ph.D.

Committee Member

Rangachar Kasturi, Ph.D.

Committee Member

Ashwin Parthasarathy, Ph.D.

Committee Member

Robert Gillies, Ph.D.


Lung Cancer, Computed Tomography, Image Features, Prediction, Machine Learning


Lung cancer is the leading cause of cancer-related death in the United States and worldwide. Early detection of lung cancer can help improve patient outcomes, and survival prediction can inform plans of treatment. By extracting quantitative features from computed tomography scans of lung cancer, predictive models can be built that can achieve both early detection and survival prediction. To build these predictive models, first a detected lung nodule is segmented, then image features are extracted, and finally a model can be built utilizing image features to make predictions. These predictions can help radiologists improve cancer care.

Building predictive models based on medical images is the basis of the budding field of radiomics. The hypothesis is that images contain phenotypic information that can be extracted to aid prediction and that automated methods can detect some things beyond human detection. With improved detection and predictive models radiomics aims to help assist radiologists and oncologists provide personalized care.

In this work a model is presented to predict long term survival versus short term survival. Forty adenocarcinoma diagnostic lung computed tomography (CT) scans from Moffitt Cancer Center were analyzed for survival prediction. These forty cases were in the top and bottom quartile for survival. A decision tree classifier was able to predict the survival group with an accuracy of 77.5% using five image features chosen from 219 using relief-f.

Another contribution of this work is a model for predicting cancer from suspicious nodules. The national lung screening trial was used to build a training set of 261 screening CTs and a test set of 237 CTs. These images were taken at the initial screening, one and two years before cancer developed. From these precursor images, which nodules developed into cancer, could be predicted at 76.79% accuracy with an area under the receiver operating characteristic curve of 0.82. A risk score was also developed to provide a measure of risk during screening. The developed risk score performed favorably in predictive accuracy compared to Lung-RADS on this data set.

The Data Science Bowl was also entered and this work examines the knowledge gained from a large-scale competition to improve imaging. In this competition participants were tasked with predicting cancer from 1397 training cases on 506 test cases. The winning entry performed with a logLoss of 0.39975 while making use of all the training data while our entry scored 1.56555 with a different set of training data. A lower logLoss shows greater accuracy. This work explains our approach and examines the winning entry.

An overview of the state of radiomicis as it applies to lung cancer is also provided. These contributions of predictive models will help to provide decision support to medical practitioners. By providing tools to the medical field the goal is to advance automated medical imaging to aid clinicians in creating diagnosis and treatment plans.