Graduation Year

2005

Document Type

Dissertation

Degree

Ph.D.

Degree Granting Department

Computer Science and Engineering

Major Professor

Lawrence O. Hall, Ph.D.

Co-Major Professor

Dmitry B. Goldgof, Ph.D.

Committee Member

Sudeep Sarkar, Ph.D.

Committee Member

Suresh Khator, Ph.D.

Committee Member

Thomas Sanocki, Ph.D.

Keywords

Machine learning, Data mining, Kernel machines, Active learning, Bit reduction

Abstract

Learning a predictive model for a large scale real-world problem presents several challenges: the choice of a good feature set and a scalable machine learning algorithm with small generalization error. A support vector machine (SVM), based on statistical learning theory, obtains good generalization by restricting the capacity of its hypothesis space. A SVM outperforms classical learning algorithms on many benchmark data sets. Its excellent performance makes it the ideal choice for pattern recognition problems. However, training a SVM involves constrained quadratic programming, which leads to poor scalability. In this dissertation, we propose several methods to improve a SVM's scalability. The evaluation is done mainly in the context of a plankton recognition problem.

One approach is called active learning, which selectively asks a domain expert to label a subset of examples from a lot of unlabeled data. Active learning minimizes the number of labeled examples needed to build an accurate model and reduces the human effort in manually labeling the data. We propose a new active learning method "Breaking Ties" (BT) for multi-class SVMs. After developing a probability model for multiple class SVMs, "BT" selectively labels examples for which the difference in probabilities between the predicted most likely class and second most likely class is smallest. This simple strategy required several times less labeled plankton images to reach a given recognition accuracy when compared to random sampling in our plankton recognition system.

To speed up a SVM's training and prediction, we show how to apply bit reduction to compress the examples into several bins. Weights are assigned to different bins based on the number of examples in the bin. Treating each bin as a weighted example, a SVM builds a model using the reduced-set of weighted examples.

Scholar Commons Citation

Luo, Tong, "Scaling Up Support Vector Machines with Application to Plankton Recognition" (2005). USF Tampa Graduate Theses and Dissertations.
https://digitalcommons.usf.edu/etd/753

Download

Included in

American Studies Commons

COinS

USF Tampa Graduate Theses and Dissertations

Scaling Up Support Vector Machines with Application to Plankton Recognition

Graduation Year

Document Type

Degree

Degree Granting Department

Major Professor

Co-Major Professor

Committee Member

Committee Member

Committee Member

Keywords

Abstract

Scholar Commons Citation

Included in

Search

Browse By

Useful Links

USF Tampa Graduate Theses and Dissertations

Scaling Up Support Vector Machines with Application to Plankton Recognition

Author

Graduation Year

Document Type

Degree

Degree Granting Department

Major Professor

Co-Major Professor

Committee Member

Committee Member

Committee Member

Keywords

Abstract

Scholar Commons Citation

Included in

Share

Search

Browse By

Useful Links