Graduation Year


Document Type




Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Mathematics and Statistics

Major Professor

Chris P. Tsokos, Ph.D.

Committee Member

Kandethody M. Ramachandran, Ph.D.

Committee Member

Lu Lu, Ph.D.

Committee Member

Yicheng Tu, Ph.D.


Biomedical Signal Processing, Data Mining, Non-Homogeneous Poisson Process, Quality of Life, Tree-based Methods, Supervised-Unsupervised Learning


Statistical learning is a set of tools for modeling and understanding complex datasets. It is a recently developed area in statistics and blends with parallel developments in computer science and, in particular, machine learning.

The classification of biomedical non-stationary signals such as Electroencephalogram (EEG) is always a challenging problem due to their complexity. The low spatial resolution on the scalp, curse of dimensionality, poor signal-to-noise ratio are disadvantages of working with biomedical signals. EEG signals are unstructured data which needs preprocessing steps to extract informative features which are measurable and predictive. In the first two chapters of this dissertation, EEG signals that are recorded in 14 different locations on the scalp are utilized to detect random eye state change. We investigate this EEG data from two perspectives i.e., classification of raw data with and without feature extraction. In one of the methods, we bypass the feature extraction phase. SPI index, which is a transformation adapted from meteorology sciences, is implemented to transform data into a more appropriate space. Then, a Bayesian analysis of non-homogeneous Poisson process (NHPP) in a presence or absence of a change-point (open to close or vice versa) is developed using MCMC. We apply the power-law function as intensity function of NHPP models. The final classifier is a model selection process between two NHPP models. In each time frame the best model, which fits to the data better, is selected. The accuracy of 74% is the best performance of the-state-of-art model.

In the second method, some features are extracted from EEG data based on fast Fourier transformation. We take into consideration all of the aforementioned difficulties and developed a three-layer classifier which is capable of solving the complexity of EEG signals (high dimensionality, noise, and poor spatial information) one by one in each step. Reduction of the number of signals from 14 to 5, with an accuracy 96% on one-second on reframed data in less than 3 seconds as well as extracting useful information from all channels (even those that seem uninformative in the first look) are main contributions of this method.

In addition to EEG data, the health-related problems are also explored in this dissertation in terms of their impact on the quality of life. The data consists of socio-demographic information as well as psychological background of 1080 individuals from different regions of Italy. This data is analyzed using supervised and unsupervised learning. The supervised learning method is a combination of classical non-parametric and machine learning methods to predict the general quality of life with an accuracy of 83%. The developed model is very informative and useful for either individual to monitor and improve their quality of life or for the administrative group to distribute their sources wisely and directly to the right target group.

In unsupervised learning, the group of people is clustered to three different categories according to their similarity in socio-demographic, health, and psychological information. The implemented model is based on the K-medoids clustering. Such clusters can be used to have better understanding of the population for further analysis.