Graduation Year
2010
Document Type
Thesis
Degree
M.S.C.S.
Degree Granting Department
Computer Science
Major Professor
Lawrence O. Hall, Ph.D.
Committee Member
Dmitry Goldgof, Ph.D.
Committee Member
Steven Eschrich, Ph.D.
Keywords
machine learning, ensembles, bootstrapping, self-training, stratification
Abstract
Semi-supervised self-learning algorithms have been shown to improve classifier accuracy under a variety of conditions. In this thesis, semi-supervised self-learning using ensembles of random forests and fuzzy c-means clustering similarity was applied to three data sets to show where improvement is possible over random forests alone. Two of the data sets are emulations of large simulations in which the data may be distributed. Additionally, the ratio of majority to minority class examples in the training set was altered to examine the effect of training set bias on performance when applying the semi-supervised algorithm.
Scholar Commons Citation
Korecki, John Nicholas, "Semi-Supervised Self-Learning on Imbalanced Data Sets" (2010). USF Tampa Graduate Theses and Dissertations.
https://digitalcommons.usf.edu/etd/1686