Graduation Year
2025
Document Type
Dissertation
Degree
Ph.D.
Degree Name
Doctor of Philosophy (Ph.D.)
Degree Granting Department
Computer Science and Engineering
Major Professor
Xinming Ou, Ph.D.
Committee Member
Lawrence Hall, Ph.D.
Committee Member
Jay Ligatti, Ph.D.
Committee Member
Nasir Ghani, Ph.D.
Committee Member
Doina Caragea, Ph.D.
Committee Member
Ankit Shah, Ph.D.
Keywords
App Representation, Data Leakage, Evaluation Metrics
Abstract
Machine learning (ML) algorithms have achieved remarkable success across various domains, including cybersecurity. Inspired by these advancements, the academic security community has explored numerous ML-based approaches for Android malware detection. While ML holds significant promise in this domain, its practical deployment faces substantial challenges, including data collection, feature selection, app representation across different models, performance instability across datasets, and inherent limitations of learning-based malware detection. These challenges can lead to overly optimistic detection results and weaken the reliability of malware detection frameworks.
Android malware detection has been extensively studied using both traditional ML and deep learning (DL) approaches. Although many state-of-the-art detection models, particularly those based on DL, claim superior performance, they are often evaluated on a limited scale without comprehensive benchmarking against traditional ML models across diverse datasets. This raises concerns about the robustness of DL-based approaches and the potential oversight of simpler, more efficient ML models. In this study, we conduct a systematic evaluation of Android malware detection models across four datasets: three publicly available, recently published datasets and a large-scale dataset we systematically collected. We implement a range of traditional ML and advanced DL models, revealing that while DL models can achieve strong performance, they are often compared against an insufficient number of traditional ML baselines. In many cases, simpler and more computationally efficient ML models yield comparable or even superior results, underscoring the need for rigorous benchmarking in Android malware detection research.
A critical aspect of ML-based malware detection is the numerical representation of apps for training and testing. We identify a widespread occurrence of distinct Android apps that have identical or nearly identical app representations. In particular, a significant portion of test samples may closely resemble or match representations of apps in the training dataset, leading to data leakage. This issue inflates the reported performance of ML models on the test set, creating an illusion of generalizability. Beyond overly optimistic assessments, data leakage can also result in qualitatively different research conclusions. We present two case studies to illustrate this impact and further examine the real-world implications using a leak-aware detection framework. Our findings demonstrate how the qualitatively different research conclusions can lead to incorrect recommendations regarding the most suitable ML models for practical deployment.
Scholar Commons Citation
Liu, Guojun, "Beyond the Hype: The Fundamental Challenges of Machine Learning-Based Android Malware Detection in Cybersecurity" (2025). USF Tampa Graduate Theses and Dissertations.
https://digitalcommons.usf.edu/etd/10881
