Graduation Year
2017
Document Type
Dissertation
Degree
Ph.D.
Degree Name
Doctor of Philosophy (Ph.D.)
Degree Granting Department
Computer Science and Engineering
Major Professor
Lawrence O. Hall, Ph.D.
Co-Major Professor
Dmitry B. Goldgof, Ph.D.
Committee Member
Rangachar Kasturi, Ph.D.
Committee Member
Sudeep Sarkar, Ph.D.
Committee Member
Ravi Sankar, Ph.D.
Committee Member
Thomas Sanocki, Ph.D.
Keywords
Mislabeled Examples, SVM, Semi-supervised Learning, Adversarial Label Noise, Finding Malwares
Abstract
Large scale datasets collected using non-expert labelers are prone to labeling errors. Errors in the given labels or label noise affect the classifier performance, classifier complexity, class proportions, etc. It may be that a relatively small, but important class needs to have all its examples identified. Typical solutions to the label noise problem involve creating classifiers that are robust or tolerant to errors in the labels, or removing the suspected examples using machine learning algorithms. Finding the label noise examples through a manual review process is largely unexplored due to the cost and time factors involved. Nevertheless, we believe it is the only way to create a label noise free dataset. This dissertation proposes a solution exploiting the characteristics of the Support Vector Machine (SVM) classifier and the sparsity of its solution representation to identify uniform random label noise examples in a dataset. Application of this method is illustrated with problems involving two real-world large scale datasets. This dissertation also presents results for datasets that contain adversarial label noise. A simple extension of this method to a semi-supervised learning approach is also presented. The results show that most mislabels are quickly and effectively identified by the approaches developed in this dissertation.
Scholar Commons Citation
Ekambaram, Rajmadhan, "Active Cleaning of Label Noise Using Support Vector Machines" (2017). USF Tampa Graduate Theses and Dissertations.
https://digitalcommons.usf.edu/etd/6830