Graduation Year

2024

Document Type

Dissertation

Degree

Ph.D.

Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Electrical Engineering

Major Professor

Zhuo Lu, Ph.D.

Committee Member

Ismail Uysal, Ph.D.

Committee Member

Nasir Ghani, Ph.D.

Committee Member

Leah Ding, Ph.D.

Committee Member

Xinming Ou, Ph.D.

Keywords

Internet of Things, Machine Learning, Music Copyright, Software and Application Security, Speaker Recognition

Abstract

Adversarial audio attacks pose significant security challenges to real-world audio applications. Attackers may manipulate speech to impersonate a speaker, gaining access to smart devices like Amazon Echo. In audio applications, there are two key areas: music and speech. In music, most attackers create a small noise-like perturbation on the original signal to evade copyright detection. However, this method degrades music's perceived quality for human listeners. In the speech, creating an adversarial example often requires many queries to the target model, a process too cumbersome for practical use in real-world scenarios, like interacting with smart devices numerous times.

In this dissertation, we first explore the integration of human factors into adversarial attack loops. Specifically, we conduct a human study to understand how participants perceive perturbations in music signals. Using regression analysis, we model the relationship between audio feature deviations and human-perceived deviations. Based on this human perception model, we propose, formulate, and evaluate a perception-aware attack framework for creating adversarial music.

Considering the black-box audio attack, we investigate adversarial attacks on real-world speaker recognition models using limited practical knowledge. We introduce the concept of the Parrot training model and utilize state-of-the-art voice conversion methods to generate parrot speech samples, enabling the construction of a surrogate model with knowledge of only a single sentence from the target speaker. We propose a two-stage PT-AE attack strategy that demonstrates greater effectiveness than existing strategies while minimizing the required attack knowledge.

Share

COinS