Graduation Year
2024
Document Type
Dissertation
Degree
Ph.D.
Degree Name
Doctor of Philosophy (Ph.D.)
Degree Granting Department
Computer Science and Engineering
Major Professor
Lawrence Hall, Ph.D.
Committee Member
Dmitry Goldgof, Ph.D.
Committee Member
Matthew Schabath, Ph.D.
Committee Member
Sudeep Sarkar, Ph.D.
Committee Member
Ashwin Parthasarathy, Ph.D.
Keywords
Histopathology, Computed Tomography, Machine Learning, Cancer
Abstract
Medical images are indispensable for assisting health care professionals to make more accurate cancer diagnosis and prognosis decisions. Several image modalities exist including, but not limited to, histopathology or whole slide images (WSI), computed tomography (CT), positron emission tomography (PET) and radiography (i.e., X-Ray), each having their own application in clinical practice.
Today, machine learning and deep learning methods have evolved to the point of being practically usable. These approaches learn and extract knowledge from data to make possible automating certain tasks. At the point of writing this dissertation, they have reached human-level performance in general image recognition tasks, became capable of real-time video analysis which made possible producing self-driving vehicles and got widely adopted for other tasks such as text translation, social media recommendations, and more.
While machine learning algorithms can be easily applied to many types of data, it is not the case for medical images. In addition to not being nearly as abundant as other types of data due to strict collection protocols, their size and structure makes them difficult to handle. For example, histopathology images are usually several gigapixels in size while computed tomography images are three dimensional. Furthermore, only a small portion of these images contains a region of interest (ROI) and, unlike most other data, annotating them requires domain experts, which is costly. This leads to many datasets not having annotations, complicating the problem further.
Despite the aforementioned problems, medical images can be made usable by performing data preprocessing, which can include finding regions of interest, converting images into a compressed representation or mitigating data-related issues, such as confounding factors. This should improve accuracy regardless of the classifier used and make possible utilizing classifiers not suitable for unprocessed images. This dissertation delves into the topic of improving histopathology and computed tomography image preprocessing. The primary focus is placed on unsupervised or weakly supervised methods in order to make the findings useful to a larger variety of people, who might not necessarily have sufficient funding or expertise for creating annotations.
For histopathology images, a novel unsupervised saturation-based tissue segmentation algorithm is proposed which outperforms similar approaches, especially in terms of recall in the worst-case scenarios. It is used as the first step in all subsequent histopathology tasks. Next, several improvements are proposed to an existing approach for converting histopathology images into feature vectors, and their effectiveness is evaluated on a survival prediction task. These improvements include determining optimal patch size as well as excluding detrimental tissue regions by using unsupervised patch scoring which is also introduced in this dissertation. A more advanced meta-learning-based unsupervised semantic segmentation is proposed next. It was evaluated on a tumor segmentation task and was able to accurately detect tumor regions, making it a zero-cost substitute for manual annotations.
For computed tomography, an issue of confounding and potential confounding factors is investigated and a comprehensive handling strategy is created. It includes detection of factors from image metadata and their subsequent mitigation based on the ComBat harmonization approach. All proposed image preprocessing methods were demonstrated to increase accuracy without the need to acquire more data, use a more sophisticated classifier or spend funds on data annotation.
Scholar Commons Citation
Fetisov, Nikolai, "Improving Medical Image Classification Accuracy Through Unsupervised Segmentation and Confounder Mitigation with Limited Data" (2024). USF Tampa Graduate Theses and Dissertations.
https://digitalcommons.usf.edu/etd/10805
