Graduation Year

2023

Document Type

Dissertation

Degree

Ph.D.

Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Electrical Engineering

Major Professor

Nasir Ghani, Ph.D.

Committee Member

Ismail Uysal, Ph.D.

Committee Member

Zhixin Miao, Ph.D.

Committee Member

Srinivas Katkoori, Ph.D.

Committee Member

Elias Bou-Harb, Ph.D.

Keywords

AI, Cybersecurity, Malware

Abstract

Ransomware is a form of malware which uses encryption methods to make data inaccessible to legitimate users. This cyberthreat has emerged as one of the most serious security challenges today, and a wide range of ransomware families have been developed and deployed, causing immense damage to governments, corporations, and private users. As this malware type continues to expand, governments across the world are taking serious steps to limit its reach. In addition, researchers have proposed a range of ransomware detection and attribution schemes, most of which use advanced machine learning (ML) techniques to process and analyze real-world empirical data from executable files, network traces, and host system logs.

Nevertheless, many studies on ransomware analysis have used datasets containing a mixof older families targeting Windows 7/8 systems (from the mid-2010s time frame). Furthermore, many of these efforts assume centralized machine learning setups. However, as ransomware threats continue to proliferate and diversify, it is becoming increasingly difficult to collect and analyze massive amounts of files and trace data at a fixed site. In particular, there are major privacy and scalability concerns here. Namely, the off-site transmission and sharing of sensitive end-user data and network logs is problematic for many organizations. Furthermore, performing all data pre-processing and machine learning tasks at a single location imposes high computational burdens and bandwidth transfer overheads. These limitations will inevitably complicate the real-world application of many existing ransomware analysis schemes.

In light of the above, there is a pressing need to develop improved solutions to detect and mitigate the latest ransomware cyberthreats, particularly those targeting current Windows 10/11 users. These offerings should also ensure user privacy and provide good scalability for handling future demands. Preferably, these solutions should also target ransomware early in the kill-chain sequence to minimize potential damage.

To address these challenges, this dissertation presents a detailed study on ransomware analysis with a focus on detection and attribution. Foremost, an up-to-date repository is curated by collecting some of latest ransomware binary files. This dataset is processed using static analysis (feature engineering) and evaluated using several well-established ML classifiers. Subsequently, a novel distributed ransomware analysis solution is proposed based upon the federated learning (FL) paradigm. This framework addresses key privacy and scalability concerns, and its performance is compared to various centralized ML schemes. Furthermore, additional modifications to the distributed FL approach are also presented to handle the realistic case of imbalanced datasets. As such, these contributions provide a strong foundation from which to develop practical ransomware solutions.

Share

COinS