Graduation Year

2021

Document Type

Dissertation

Degree

Ph.D.

Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Electrical Engineering

Major Professor

Ravi Sankar, Ph.D.

Committee Member

Ismail Uysal, Ph.D.

Committee Member

Ehsan Sheybani, Ph.D.

Committee Member

Sriram Chellappan, Ph.D.

Committee Member

Supraja Anand, Ph.D.

Keywords

Dysarthria, Dysphonia, LSTM Autoencoders, Overfit Factor, Phonatory Analysis, Pitch Synchronous Segmentation

Abstract

Neurodegenerative diseases affect millions of people around the world. The progressive degeneration worsens the symptoms, heavily impacting the quality of life of the patients as well as the caregivers. Speech production is one of the physiological processes affected by neurodegenerative diseases like Alzheimer’s disease, amyotrophic lateral sclerosis (ALS) and Parkinson’s disease (PD). Speech is the most basic form of communication, and the effect of neurodegeneration degrades speech production, thereby reducing social interaction and mental well-being. PD is the second most common neurodegenerative disease affecting speech production in 90% of the diagnosed individuals. Speech analysis methods for PD in clinical methods are primarily perceptual. The acoustic analysis of speech impairments could help understand speech patterns that differ between pathological and healthy speech. This knowledge can benefit the early diagnosis and telemonitoring applications. Speech analysis is extremely beneficial because it is non-invasive, fast and easy to implement remotely. Studies using established analysis methods have been focused on speech due to these merits and steadily improving in the past decade.

The objective of most research studies working with Parkinsonian speech is the development of automatic evaluation methods. Most of these studies use sustained vowel phonations due to their simplicity in the analysis. Fewer studies have focused on using connected speech tasks like conversational speech, passage readings or monologues due to the reduced control over the data and complexity in analysis. Typical speech impairments observed in individuals with PD are identified in connected speech, and studies have shown the superiority of connected speech over sustained phonations in disease detection. The traditional processing steps adopted in existing methods have been criticized for spectral distortion and smoothing effects. In this dissertation, a new framework for the automatic evaluation of connected speech has been compared to the conventional methods.

In contrast to the existing methods, the proposed framework uses pitch synchronous segmentation of voiced components of connected speech to ensure consistency in processing, avoiding distortion and smoothing effects. Novel pitch synchronous features (PSFs) quantifying the cycle-to-cycle perturbations are extracted from each voiced segment, and the covariances of the 1st order differences in these features are used to train classifiers. This methodology helps in extracting the phonatory and articulatory deficits that are hard to quantify using traditional methods. The impact of pitch synchronous segmentation has also been tested using unsupervised feature extraction using LSTM Autoencoder.

This study's results are encouraging. Deviations from the established methods incorporated in the proposed framework are evaluated systematically. Evaluation using existing methods using sustained phonations show impressive 92% cross-validation accuracy, but when tested on a different dataset, accuracy falls to 50%. With the proposed framework using PSF covariances, the cross-validation accuracy was 80%, and test accuracy using a different dataset was 72%. With LSTM Autoencoders, the cross-validation and test accuracies were 89% and 73%. Thus, the traditional methods deliver excellent performance with familiar data but fail with new data. The proposed framework using PSFs and Autoencoders based features deliver comparable performance with familiar and new datasets.

Share

COinS