MS in Computer Science (M.S.C.S.)
Degree Granting Department
Computer Science and Engineering
Lawrence Hall, Ph.D.
Rangachar Kasturi, Ph.D.
Dmitry Goldgof, Ph.D.
Mislabeled Examples, SVM
Mislabeled examples affect the performance of supervised learning algorithms. Two novel approaches to this problem are presented in this Thesis. Both methods build on the hypothesis that the large margin and the soft margin principles of support vector machines provide the characteristics to select mislabeled examples. Extensive experimental results on several datasets support this hypothesis. The support vectors of the one-class and two-class SVM classifiers captures around 85% and 99% of the randomly generated label noise examples (10% of the training data) on two character recognition datasets. The numbers of examples that need to be reviewed can be reduced by creating a two-class SVM classifier with the non-support vector examples, and then by only reviewing the support vector examples based on their classification score from the classifier. Experimental results on four datasets show that this method removes around 95% of the mislabeled examples by reviewing only around about 14% of the training data. The parameter independence of this method is also verified through the experiments. All the experimental results show that most of the label noise examples can be removed by (re-)examining the selective support vector examples. This property can be very useful while building large labeled datasets.
Scholar Commons Citation
Ekambaram, Rajmadhan, "Label Noise Cleaning Using Support Vector Machines" (2016). USF Tampa Graduate Theses and Dissertations.