Graduation Year

2020

Document Type

Dissertation

Degree

Ph.D.

Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Computer Science and Engineering

Major Professor

Dmitry B. Goldgof, Ph.D.

Co-Major Professor

Lawrence O. Hall, Ph.D.

Committee Member

Sudeep Sarkar, Ph.D.

Committee Member

Ashwin B. Parthasarathy, Ph.D.

Committee Member

Robert J. Gillies, Ph.D.

Keywords

Computed Tomography, Diagnosis, Lung Cancer

Abstract

Lung cancer (LC) is leading in the number of deaths among the other types of cancer. According to the American Cancer Society, 135,720 deaths during 2020 in the USA will be associated with LC. The patient 5-year survival rate of 16\% was reported in 1986 and 19% in 2019. One of the reasons why survival rate remains low is that the majority of patients diagnosed with cancer had stages III and IV. In contrast, a 5-year survival rate of 70% was reported for patients of stage IA after surgical resection in the National Lung Screening Trial.

CT screening detects a number of pulmonary nodules that have to be classified as benign or malignant. Radiomics is based on the concept that quantitative features extracted from medical images can be effectively used for differentiation of abnormal tissue into benign or malignant categories by applying machine learning methods. Such computer-aided decision making (CAD) systems were shown to be effective tools for patient diagnosis, treatment response prediction, cancer aggressiveness estimation, and gene mutation type detection. Conventional radiomic features describe the size, shape, location, and structural patterns (texture) of tissue. Texture features are commonly computed over the entire nodules and thus they are averaged with respect to different texture patterns presented in a nodule. In comparison, a set of algorithms is focused on the detection of nodule subregions with similar properties (habitats), such as texture, as a part of the feature extraction step, and used information about habitats to describe a nodule.

This dissertation introduces new algorithms designed to increase the performance of patient diagnostic systems as well as lung cancer tumor’s aggressiveness categorization. Diagnosis experiments were performed on the National Lung Screening Trial (NLST) dataset. Cancer aggressiveness estimation experiments were performed on a set of patients diagnosed with Adenocarcinoma at the H. Lee. Moffitt Cancer Center & Research Institute. Due to the variance of reported nodule sizes, the dataset was split into size categories and each CAD system for a size-group was designed individually. As an extension for the size split project, delta features were computed and added into the feature set. Delta features characterize temporal changes in a nodule. A lung cancer diagnosis system that utilizes baseline and delta features is reported. A novel habitat revealing algorithm was presented and its utilization for lung cancer diagnosis and lung cancer aggressiveness classification is provided in detail. Considering the beneficial usage of the developed approaches as a set of independent methods, a delta habitat revealing algorithm was designed. The delta habitat revealing algorithms quantify information about habitats within a nodule and how these habitats changed in time. The performance evaluation was performed using the NLST dataset, thus a split of patients into size-groups was performed. Finally, we designed several experiments to show that size is an important feature not only in clinical practice and Radiomics but also for Convolution Neural Networks that process only image data. If warping (up-sampling) was applied as a pre-processing step, it is shown that the size of a nodule is encoded in texture and decoded by CNN for decision making.

Nodule classification Area Under Receiver Operating Curve (AUROC) in the NLST dataset was improved from 0.69 to 0.79 by developing CAD systems for nodule size-groups independently. The inclusion of delta features enhanced CAD classification AUROC to 0.86 in the NLST. Features that were produced by the habitat revealing algorithm statistically significantly improved lung cancer patient survival time classification AUROC from 0.71 to 0.91 in a set of adenocarcinoma patients. Finally, AUROCs of 0.91, 0.87 and 0.92 were achieved for “small”, “medium” and “large” size-groups in the NLST dataset by combining delta-habitat and conventional radiomic feature sets. A CNN model trained from scratch to differentiate "small"/"large" nodules and a CNN model, that originally was trained to classify cancer/non-cancer nodules, tuned to classify size categories showed accuracy more that 80% and AUROC more than 0.80 for a variety "small"/"large" labeling methods.

COinS