Graduation Year
2024
Document Type
Dissertation
Degree
Ph.D.
Degree Name
Doctor of Philosophy (Ph.D.)
Degree Granting Department
Electrical Engineering
Major Professor
Zhuo Lu, Ph.D.
Committee Member
Ismail Uysal, Ph.D.
Committee Member
Nasir Ghani, Ph.D.
Committee Member
Leah Ding, Ph.D.
Committee Member
Xinming Ou, Ph.D.
Keywords
Internet of Things, Machine Learning, Music Copyright, Software and Application Security, Speaker Recognition
Abstract
Adversarial audio attacks pose significant security challenges to real-world audio applications. Attackers may manipulate speech to impersonate a speaker, gaining access to smart devices like Amazon Echo. In audio applications, there are two key areas: music and speech. In music, most attackers create a small noise-like perturbation on the original signal to evade copyright detection. However, this method degrades music's perceived quality for human listeners. In the speech, creating an adversarial example often requires many queries to the target model, a process too cumbersome for practical use in real-world scenarios, like interacting with smart devices numerous times.
In this dissertation, we first explore the integration of human factors into adversarial attack loops. Specifically, we conduct a human study to understand how participants perceive perturbations in music signals. Using regression analysis, we model the relationship between audio feature deviations and human-perceived deviations. Based on this human perception model, we propose, formulate, and evaluate a perception-aware attack framework for creating adversarial music.
Considering the black-box audio attack, we investigate adversarial attacks on real-world speaker recognition models using limited practical knowledge. We introduce the concept of the Parrot training model and utilize state-of-the-art voice conversion methods to generate parrot speech samples, enabling the construction of a surrogate model with knowledge of only a single sentence from the target speaker. We propose a two-stage PT-AE attack strategy that demonstrates greater effectiveness than existing strategies while minimizing the required attack knowledge.
Scholar Commons Citation
Duan, Rui, "Advancing Adversarial Audio: Human-in-the-Loop Black-box Attacks" (2024). USF Tampa Graduate Theses and Dissertations.
https://digitalcommons.usf.edu/etd/10502