USF Tampa Graduate Theses and Dissertations

Advancing Text Summarization and Classification: Deep Insights from Transformer-Based Statistical Learning

Kun Bu, University of South Florida

Graduation Year

2024

Document Type

Dissertation

Degree

Ph.D.

Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Mathematics and Statistics

Major Professor

Kandethody Ramachandran, Ph.D.

Committee Member

Lu Lu, Ph.D.

Committee Member

Seung-Yeop Lee, Ph.D.

Committee Member

Feng Cheng, Ph.D.

Keywords

Statistical Machine Learning, Natural Language Processing (NLP), Sentiment Analysis, Data Science, Ensemble Learning, Random Forest, Transformer

Abstract

Artificial Intelligence (AI) is a part of human's daily life nowadays. Machine Learning (ML) as one aspect from AI has been rapidly developing during the past two decades, especially from the statistical learning approaches, which emphasized the use of probability and statistics to model data, such as Support Vector Machines (SVMs) for classification and regression tasks to the ensemble learning techniques, such as Random Forest, Gradient Boosting Machine (GBM), and stacking. Ensemble learning has evolved into a pivotal concept in contemporary machine learning, empowering practitioners to amalgamate multiple models to enhance generalization, accuracy, and robustness. As the field of machine learning progresses, ensemble techniques are poised to retain their significance as indispensable tools in tackling intricate real- world challenges.

Since the publication of Google's paper “Attention is All You Need” at Neural Information Processing Systems 30 (NIPS 2017), the transformer architecture has witnessed widespread adoption in numerous Natural Language Processing (NLP) scenarios. It has been employed in various settings, ranging from applications utilizing the entire seq2seq architecture to those focusing solely on the encoder component, exemplified by the increasing popularity of models like GPT and BERT. In this dissertation, we will explore practical scenarios to illustrate the extensive and diverse applications of NLP facilitated by the transformer architecture.

This dissertation contains three distinct research studies: one study focusing on the ensemble learning method to solve a deception detection problem based the Miami University Deception Detection Database (MU3D). This study introduces a novel approach wherein we crafted an ensemble learning model based on random forest and evaluated its performance in the domain of deception detection. The other study is a statistical related Natural Language Processing (NLP) on the computer science field. The novelty in this study is that we analyzed the before and after text summarization sentiment for the unstructured Twitter (now they renamed the company as X) data using a fine-tuned large language models (LLMs) for text summarization. The impact of both studies was to combine statistical exploratory data analysis together with machine learning (ML) algorithms and large language models to solve real-life problems and inspired other researchers in this field. The third study focusing on machine learning and natural language processing to address practical challenges in pharmacovigilance analysis, with a particular focus on uncovering Drug-Drug Interactions (DDIs). The discussion is segmented into multiple sub-sections, ranging from the foundational concepts to the experimental framework, providing a theoretical exposition of the subject matter.

Scholar Commons Citation

Bu, Kun, "Advancing Text Summarization and Classification: Deep Insights from Transformer-Based Statistical Learning" (2024). USF Tampa Graduate Theses and Dissertations.
https://digitalcommons.usf.edu/etd/10800

Download

Included in

Artificial Intelligence and Robotics Commons, Medicinal Chemistry and Pharmaceutics Commons, Statistics and Probability Commons

COinS

USF Tampa Graduate Theses and Dissertations

Advancing Text Summarization and Classification: Deep Insights from Transformer-Based Statistical Learning

Graduation Year

Document Type

Degree

Degree Name

Degree Granting Department

Major Professor

Committee Member

Committee Member

Committee Member

Keywords

Abstract

Scholar Commons Citation

Included in

Search

Browse By

Useful Links

USF Tampa Graduate Theses and Dissertations

Advancing Text Summarization and Classification: Deep Insights from Transformer-Based Statistical Learning

Author

Graduation Year

Document Type

Degree

Degree Name

Degree Granting Department

Major Professor

Committee Member

Committee Member

Committee Member

Keywords

Abstract

Scholar Commons Citation

Included in

Share

Search

Browse By

Useful Links