Graduation Year


Document Type




Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Computer Science and Engineering

Major Professor

Dmitry B. Goldgof, Ph.D.

Co-Major Professor

Lawrence O. Hall, Ph.D.

Committee Member

Sudeep Sarkar, Ph.D.

Committee Member

Ashwin Parthasarathy, Ph.D.

Committee Member

Robert Gillies, Ph.D.


CNN, Deep Features, Ensemble, Lung Cancer, Radiomics


Lung cancer has a high incidence and mortality rate. The five-year relative survival rate for all lung cancers is 18%. Due to the high mortality and incidence rate of lung cancer worldwide, early detection is essential. Low dose Computed Tomography (CT) is a commonly used technique for screening, diagnosis, and prognosis of non-small cell lung cancer (NSCLC). The National Lung Screening Trial (NLST) compared low-dose helical computed tomography (LDCT) and standard chest radiography (CXR) for three annual screens and reported a 20% relative reduction in lung cancer mortality for LDCT compared to CXR. As such, LDCT screening for lung cancer is an effective way of mitigating lung cancer mortality and is the only imaging option for those at high risk. Lung cancer screening for high-risk patients often detects a large number of indeterminate pulmonary nodules, of which only a subset will be identified as cancer. As such, reliable and reproducible biomarkers determining which indeterminate pulmonary nodules will be identified as cancer would have significant translational implications as a therapeutic method to enhance lung cancer screening for nodule detection.

Radiomics is an approach to extract high-dimensional quantitative features from medical images, which can be used individually or merged with clinical data for predictive and diagnostic analysis. Quantitative radiomics features (size, shape, and texture) extracted from lung CT scans have been shown to predict cancer incidence and prognosis. Deep learning is an emerging machine learning approach, which has been applied to the classification and analysis of various cancers and tumors. To generate generic features (blobs, edges, etc.) from an image, different convolutional kernels are applied over the input image, and then those generic feature-based images are passed through some fully connected neural layers. This category of the neural network is called a convolutional neural network (CNN), which has achieved high accuracy on image data. With the advancement of deep learning and convolutional neural networks (CNNs), deep features can be utilized to analyze lung CTs for prognosis prediction and diagnosis.

In this dissertation, deep learning-based approaches were presented for lung nodule malignancy prediction. A subset of cases from the NLST was chosen as a dataset in our study.

We experimented with three different pre-trained CNNs for extracting deep features and used five different classifiers. Experiments were also conducted with deep features from different color channels of a pre-trained CNN. Selected deep features were combined with radiomics features. Three CNNs were designed and trained. Combinations of features from pre-trained, CNNs trained on NLST data, and classical radiomics were used to build classifiers. The best accuracy (76.79%) was obtained using feature combinations. An area under the receiver operating characteristic curve of 0.87 was obtained using a CNN trained on an augmented NLST data cohort.

After that, each of the three CNNs was trained using seven different seeds to create the initial weights. These enabled variability in the CNN models, which were combined to generate a robust, more accurate ensemble model. Augmenting images using only rotation and flipping and training with images from T0 yielded the best accuracy to predict lung cancer incidence at T2 from a separate test cohort (Accuracy = 90.29%; AUC = 0.96) based on an ensemble 21 models.

From this research, five conclusions were obtained, which will be utilized in future research. First, we proposed a simple and effective CNN architecture with a small number of parameters useful for smaller (medical) datasets. Second, we showed features obtained using transfer learning with all the channels of a pre-trained CNN performed better than features extracted using any single channel and we also constructed a new feature set by fusing quantitative features with deep features, which in turn enhanced classification performance. Third, ensemble learning with deep neural networks was a compelling approach that accurately predicted lung cancer incidence at the second screening after the baseline screen, mostly two years later. Fourth, we proposed a method for deep features to have a recognizable definition via semantic or quantitative features. Fifth, deep features were dependent on the scanner parameters, and the dependency was removed using pixel size based normalization.