Graduation Year
2023
Document Type
Dissertation
Degree
Ph.D.
Degree Name
Doctor of Philosophy (Ph.D.)
Degree Granting Department
Physics
Major Professor
Vladimir Feygelman, Ph.D.
Co-Major Professor
Ghanim Ullah, Ph.D.
Committee Member
Eduardo G. Moros, Ph.D.
Committee Member
Jimmy Caudell, Ph.D.
Committee Member
Issam El Naqa, Ph.D.
Keywords
autocontouring, radiation, organs at risk, artificial intelligence
Abstract
This dissertation is devoted to the study of deep learning-based autosegmentation in head and neck radiotherapy. Much of the work presented here is motivated by the need to introduce a clinically useful autosegmentation model for head and neck organs at risk, with the aim of reducing inter-observer variation in structure segmentation and enhancing time efficiency of the treatment planning process. This dissertation describes autosegmentation approaches, introduces a prototype deep learning-based autosegmentation algorithm trained with carefully curated local gold data, and presents a series of comprehensive evaluations to verify the feasibility of implementing the prototype model in clinical settings.
One of the challenges of adopting a deep learning-based autosegmentation technique in radiotherapy is the need for a large size of carefully curated high-quality gold data that accurately represents the population being treated. Combined with a deep learning algorithm, the training data plays a critical role in determining the ultimate performance of the autosegmentation model. Although deep learning algorithms do not rely heavily on prior knowledge like more conventional autosegmentation techniques, the precise delineation of the various organs in the head and neck region within the training data is crucial in determining the performance of autosegmentation models. These relationships are explored in detail in this thesis chapters.
Starting in Chapter 1, we provide a brief overview of the limitations of manual segmentations in various aspects that necessitate the automation of the process. We also discuss the transition of autosegmentation techniques and the current status of the deep learning-based approach.
In Chapter 2, we first trained and evaluated a well-established commercial deep learning-based autosegmentation software to verify the feasibility of the approach and assess the need for a higher quality autosegmentation model that can generate more clinically useful structures in a fraction of the time. A commercial software was trained with local data and compared its performance to the same algorithm trained at different institutions, as well as to the gold data. We confirmed that deep learning-based autosegmentation has a potential to be a useful time-saving solution in HN radiotherapy treatment planning. However, due to the complexity of anatomical structures in the HN region, we also found that most of the autosegmented structures in this study required minor or major editing, which defeats the time-saving benefits, especially if repeated for the large number of OARs in HN area. This study revealed two main findings. Firstly, the quality of training data plays a critical role in the performance of deep learning models and should align with user preferences. Specifically, the model trained with local data showed superior performance compared to the model trained with data from different institutions. Secondly, even with carefully curated high-quality local data, the DL model's performance was suboptimal due to inherent imperfections in the algorithm. These findings emphasize the need for ongoing research and development to enhance the accuracy and reliability of DL-based autosegmentation algorithms.
Moving forward, in Chapter 3, we evaluated a newly developed prototype deep learning-based autosegmentation algorithm based on a fully convolutional network that combines U-Net and V-Net architectures which was developed in a collaboration with a vendor and presents our experience in training and evaluation of the prototype DL model. The model was trained with more than 600 carefully curated previously treated HN cases. Our findings highlight that the prototype model outperformed a commercial algorithm and generates clinically useful HN OAR structures. Specifically, 93% of the autosegmented structures generated by the prototype model were deemed clinically useful, and 20% of these structures required no editing at all, which is a tenfold improvement compared to other models. Furthermore, the prototype model exhibited the highest geometric similarity to the gold standard data compared to the commercial models, across all evaluation metrics.
In Chapter 4, we evaluated the dosimetric impact of the autosegmented HN OARs generated by the prototype model. To verify the ability to use unedited autosegmented OARs in treatment planning, while maintaining the plan quality, we generated new treatment plans based on original targets and unedited DL-produced OARs. The new dose distributions were then applied back to the manually delineated structures to test if the treatment plan generated based on the autosegmented structures were still valid with the manually delineated structures. The nearly identical primary target coverage for the original and re-generated plans was achieved, and the areas under the corresponding pairs of the dose-volume histogram (DVH) curves were also nearly identical. 99% of the critical DVH points met the clinical objectives with both the re-planned dose and autosegmented structures, and with the manual ones. In short, the DL-generated HN OARs resulted in treatment plans of equivalent quality to the original ones.
In Chapter 5, we explored the potential correlation between the inter-observer variation in OAR contouring and radiation-induced toxicities in HN radiotherapy and assess if the prototype DL autosegmentation model could help with reducing the toxicities. We divided previously treated HN cases into two groups, based on physicians with different segmentation and treatment planning approaches. The first group (A) of physicians include the physician who contoured the gold data which was used to train the prototype model, and other physicians in this group were trained to contour the HN OARs in the same way. And the physicians in the other group (B) were trained elsewhere, and the inter-observer variation was possible between these two groups. A significant difference in radiation-induced toxicities between the two groups was observed. Patients in group B were more frequently hospitalized, experienced higher weight loss, and had more feeding tube placement. In the critical DVH points analysis, the percentage of DVH points that failed to meet the clinical objective were much higher in group B in all OARs. In group B, the autosegmented structures, which mimics the manual structures of group A physicians, received even higher dose than the manual structures. This indicates that the actual OARs might have received higher dose than indicated in some of the group B treatment plans. Therefore, the deep learning-based autosegmented structures could help reduce toxicities by generating consistent high quality HN OAR structures.
Finally, Chapter 6 provides an overview of the major findings. Additionally, future research plans are introduced, including the development of an automatic dose prediction algorithm and the exploration of correlations between dosimetric parameters and complication probabilities. These efforts aim to achieve the goal of substantial automation of the treatment planning process consistent with the proven clinical outcomes.
Scholar Commons Citation
Koo, Jihye, "Evaluation of a Prototype Deep Learning-based Autosegmentation Algorithm on a High Quality Database of Head and Neck Cancer Radiotherapy Patients" (2023). USF Tampa Graduate Theses and Dissertations.
https://digitalcommons.usf.edu/etd/9890