MS in Electrical Engineering (M.S.E.E.)
Degree Granting Department
Ravi Sankar, Ph.D.
Alexandro Castellanos, Ph.D.
Kwang-Cheng Chen, Ph.D.
3D Localization, CNN Deep Learning, Speech Signal Processing, Stereo Sound
In the process of propagating as a carrier of information in space, in addition to transmitting the information itself, the acoustic signal also contains the position information of the sound source itself and its related physical characteristics. Acoustic signal uses the medium (such as air, water, steel, etc.) in the space to transmit mechanical vibration and longitudinal waves from the sound source to the outside world. The traditional single audio collection device cannot collect position information and sound source characteristic information. Therefore, the signals processed in the audio signal processing process are all mixed source acoustic signals after spatial reverberation. The significance of studying the sound source recognition and positioning in the three-dimensional space is that can help the computer to reshape the specific information of the sound source by using artificial intelligent acoustic processing, effectively separate a single sound source from the environment or synthesize a fine stereo source for virtual reality scene. For instance, in an acoustic environment containing multiple background noises, it is not possible to filter all the background noises from the received audio signal. At this time, the system can be used to perform deep learning of convolutional neurons on the information of the target sound source, and finally strip off the undesired sound signal. The realization of this research will be able to help better machine learning algorithms in the audio field and other fields of speech signal processing. In this thesis, the location and recognition of the sound source will be achieved through two major parts. In the first part, the stereo signals in the three-dimensional space will be collected by the sensor array, and the correlation of the signals of each radio unit will be compared. After the comparison, the delayed signal will be measured by the time difference of arrival algorithm to obtain the position information of the sound source. In the second part, the original multi-source signal is integrated into a relatively independent unit signal by determining the location information of the audio, and the audio is subjected to cross-comparison in the time domain and the frequency domain after noise reduction processing. Afterwards, the convolutional neural network is used to identify the target audio features. Finally, the results calculated by the two parts are combined to realize the analysis of the sound source of the acoustic signal. In the second part, the original multi-source signal is integrated into a relatively independent unit signal by determining the location information of the audio, and the audio is subjected to cross-comparison in the time domain and the frequency domain after noise reduction processing. Afterwards, the convolutional neural network is used to identify the target audio features. Finally, the results calculated by the two parts combined can realize the analysis of the sound source of the acoustic signal. The sound source recognition system that integrates sound source recognition and sound source localization can effectively identify active signals in background environmental noise in an indoor environment and obtain a good sound source recognition accuracy rate. The application of this system will be able to effectively help the computer system to perceive the surrounding environment and realize effective three-dimensional coordinate monitoring of specific sound sources. This research has a wide range of applications in acoustic recognition and sound source location and monitoring.
Scholar Commons Citation
Xu, Cong, "Spatial Stereo Sound Source Localization Optimization and CNN Based Source Feature Recognition" (2020). USF Tampa Graduate Theses and Dissertations.