Graduation Year

2025

Document Type

Dissertation

Degree

Ph.D.

Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Electrical Engineering

Major Professor

Ismail Uysal, Ph.D.

Committee Member

Mia Naeini, Ph.D.

Committee Member

Nasir Ghani, Ph.D.

Committee Member

Robert Karam, Ph.D.

Committee Member

Zhabiz Gharibshah, Ph.D.

Keywords

Feature Selection, Supervised Feature Learning, Slotting Algorithm, Adaptive Training, Data Analysis

Abstract

This dissertation presents the development and application of a novel feature selection method, Probability Weighted Feature Selection (PWFS), designed to address key challenges in machine learning involving high-dimensional, noisy, or biased datasets. It is a comprehensive exploration of feature selection strategies to enhance the performance, interpretability, and practicality of machine learning models in both engineering systems and educational data mining. PWFS introduces a structured, probabilistic approach to feature selection that adaptively weights features based on their empirical contribution to model performance. This method not only accelerates convergence to near-optimal feature subsets but also enhances flexibility for ensemble learning by encouraging the formation of decorrelated feature clusters, employing combinatorial analysis and slotting architectures. Spanning four studies, this work contributes to the development of new methodologies and their application in real-world, high-impact domains.

The initial study focuses on the reliability and cost efficiency of wireless sensor networks (WSNs) deployed in temperature-controlled logistics. A combinatorial machine learning framework was employed to identify optimal sensor placements capable of predicting temperatures at critical locations in the event of sensor failure. The statistical correlations between different loggers and logger combinations are studied to identify a systematic approach to finding the optimal setting and placement of loggers under a cost constraint. Our findings suggest that even under different and incrementally higher cost constraints, one can use empirical approaches such as neural networks to predict temperature variations in a location with an absent or failed logger, within a margin of error comparable to the manufacturer-specified sensor accuracy. By exhaustively training and evaluating over a thousand model configurations, the study demonstrated that sensor redundancy could be minimized while maintaining high prediction accuracy, thereby reducing hardware costs and improving system resilience. In fact, the median test accuracy is 1.02 degrees Fahrenheit when using only a single sensor to predict the remaining locations under the assumptions of critical system failure, and drops to as little as 0.8 and 0.65 degrees Fahrenheit when using one or three more sensors in the prediction algorithm. This foundational work highlighted the need for a more scalable, learning-based method to handle large feature spaces with interdependencies, ultimately motivating the creation of PWFS. Feature selection has been a fundamental research area for both conventional and contemporary machine learning since the beginning of predictive analytics. From early statistical methods such as principal component analysis, to more recent and data-driven approaches such as deep unsupervised feature learning, selecting input features to achieve the best objective performance has been a critical component of any learning application. In this study, we propose a novel, easily replicable, and robust approach called Probability Weighted Feature Selection (PWFS), which randomly selects a subset of features prior to each training-testing regimen and assigns probability weights to each feature based on an objective performance metric such as accuracy, mean-square error, or area-under-the-curve for the receiver-operating-characteristic curve (AUC-ROC). Using the objective metric scores and weight assignment techniques based on the golden ratio led iteration method, the features that yield higher performance are incrementally more likely to be selected in subsequent train-test regimens, whereas the opposite is true for features that yield lower performance. This probability-based search method has been shown to have significantly faster convergence to a near-optimal set of features compared to a completely random search in the feature space.

We compare our method with an exhaustive list of twelve popular feature selection algorithms and demonstrate equal or better performance on a range of benchmark datasets. The specific approach to assigning weights to the features also allows for expanded applications in which two correlated features can be included in separate clusters of near-optimal feature sets for ensemble learning scenarios.

The third and fourth studies apply PWFS in the context of a National Science Foundation-funded initiative targeting academic performance prediction among undergraduate engineering and psychology students. In the first application, PWFS was employed to mine patterns in action-state orientation survey data across diverse cohorts. Treating academic underperformance (GPA below 2.0) as an anomaly detection problem, machine learning models were trained on successful student responses and evaluated on previously unseen data. PWFS enabled the identification of survey items that consistently contributed to accurate predictions, highlighting the importance of self-regulatory behaviors and extracurricular engagement.

The final study expanded this line of inquiry by incorporating longitudinal analysis across five academic semesters. A robust matching system was developed to track individual student trajectories while preserving survey anonymity. Leveraging a larger dataset and refined classification objectives (e.g., GPA above or below 3.33), the models demonstrated improved accuracy, with PWFS-selected features revealing the measurable impact of classroom interventions on student study habits. Notably, a significant increase in false positives when models trained on pre-intervention data were applied to post-intervention responses suggested behavioral improvements not yet reflected in academic outcomes, underscoring the value of early behavioral indicators in predictive analytics.

Together, these four studies advance the understanding and implementation of feature selection in practical machine learning applications, introducing PWFS as a scalable, adaptive, and interpretable solution applicable to both technical systems and human-centered domains.

Share

COinS