Graduation Year
2023
Document Type
Dissertation
Degree
Ph.D.
Degree Name
Doctor of Philosophy (Ph.D.)
Degree Granting Department
Electrical Engineering
Major Professor
Nasir Ghani, Ph.D.
Committee Member
Ismail Uysal, Ph.D.
Committee Member
Zhixin Miao, Ph.D.
Committee Member
Srinivas Katkoori, Ph.D.
Committee Member
Elias Bou-Harb, Ph.D.
Keywords
AI, Cybersecurity, Malware
Abstract
Ransomware is a form of malware which uses encryption methods to make data inaccessible to legitimate users. This cyberthreat has emerged as one of the most serious security challenges today, and a wide range of ransomware families have been developed and deployed, causing immense damage to governments, corporations, and private users. As this malware type continues to expand, governments across the world are taking serious steps to limit its reach. In addition, researchers have proposed a range of ransomware detection and attribution schemes, most of which use advanced machine learning (ML) techniques to process and analyze real-world empirical data from executable files, network traces, and host system logs.
Nevertheless, many studies on ransomware analysis have used datasets containing a mixof older families targeting Windows 7/8 systems (from the mid-2010s time frame). Furthermore, many of these efforts assume centralized machine learning setups. However, as ransomware threats continue to proliferate and diversify, it is becoming increasingly difficult to collect and analyze massive amounts of files and trace data at a fixed site. In particular, there are major privacy and scalability concerns here. Namely, the off-site transmission and sharing of sensitive end-user data and network logs is problematic for many organizations. Furthermore, performing all data pre-processing and machine learning tasks at a single location imposes high computational burdens and bandwidth transfer overheads. These limitations will inevitably complicate the real-world application of many existing ransomware analysis schemes.
In light of the above, there is a pressing need to develop improved solutions to detect and mitigate the latest ransomware cyberthreats, particularly those targeting current Windows 10/11 users. These offerings should also ensure user privacy and provide good scalability for handling future demands. Preferably, these solutions should also target ransomware early in the kill-chain sequence to minimize potential damage.
To address these challenges, this dissertation presents a detailed study on ransomware analysis with a focus on detection and attribution. Foremost, an up-to-date repository is curated by collecting some of latest ransomware binary files. This dataset is processed using static analysis (feature engineering) and evaluated using several well-established ML classifiers. Subsequently, a novel distributed ransomware analysis solution is proposed based upon the federated learning (FL) paradigm. This framework addresses key privacy and scalability concerns, and its performance is compared to various centralized ML schemes. Furthermore, additional modifications to the distributed FL approach are also presented to handle the realistic case of imbalanced datasets. As such, these contributions provide a strong foundation from which to develop practical ransomware solutions.
Scholar Commons Citation
Vehabovic, Aldin, "Machine Learning Approach for Static Ransomware Analysis" (2023). USF Tampa Graduate Theses and Dissertations.
https://digitalcommons.usf.edu/etd/10773
