Graduation Year

2022

Document Type

Dissertation

Degree

Ph.D.

Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Mathematics and Statistics

Major Professor

Kaiqi Xiong, Ph.D.

Committee Member

Kandethody M. Ramachandran, Ph.D.

Committee Member

Lu Lu, Ph.D.

Committee Member

Nasir Ghani, Ph.D.

Keywords

Adversarial Example, Cybersecurity, Machine Learning, Machine Learning Security

Abstract

Machine translation software, image captioning, grammar check (Grammarly), chatbot, real-time captioning and translation, music genre classification, and document classification are a few examples of deep learning applications that achieve outstanding performance in areas where traditional statistical techniques have difficulty performing classification and/or regression. Google translator has over 100 billion daily users and can translate 109 languages instantly (much faster than a human translator). AlphaGo won Lee Sedol, the eighteen-time world champion. Microsoft Team's living captioning provides accurate real-time captioning as a speaker speaks. Deep learning achieves undoubtedly remarkable performance. However, recent studies on adversarial attacks and data poisoning attacks show that deep learning models are vulnerable to these attacks.

In this dissertation, we focus on a special type of data poisoning attack, triggerless clean-label data poisoning attacks. In such an attack, some training instances are manipulated in order to cause the misclassification of a target instance (that is protected from manipulation) on a targeted attack. The main difference between a triggerless data poisoning attack and adversarial examples is that a triggerless data poisoning attack assumes an attacker cannot manipulate (targeted) test data whereas an adversarial example assumes an attacker can manipulate test data to cause misclassification. Deep learning models are vulnerable to both attacks.

This dissertation research presents solutions and mechanisms for enhancing the deep learning models from adversarial examples, malicious mislabeling, and data poisoning attacks. Specifically, a new iterative retraining approach to reducing the effectiveness of adversarial examples is presented. To overcome the time complexity issue, a parallel implementation of the proposed approach is developed to make it scalable. Moreover, an efficient active learning method is developed to defend against malicious mislabeling and data poisoning attacks. The empirical evaluation shows its effectiveness against various adversarial attacks and data poisoning attacks while maintaining the performance on the untampered dataset. Finally, a novel approach is introduced to alter an adversarial attack method to a stealthy triggerless clean-label data poisoning attack. All presented methodologies are comprehensively evaluated on either Gaivi or Research Computing Central Instructional and Research Computing Environment (CIRCE).

Share

COinS