Graduation Year

2005

Document Type

Thesis

Degree

M.S.C.S.

Degree Granting Department

Computer Science

Major Professor

Lawrence O. Hall, Ph.D.

Committee Member

Dmitry Goldgof, Ph.D.

Committee Member

Sudeep Sarkar, Ph.D.

Keywords

Random forests, Nearest centroid, Exodus, ParaView, Region labeling

Abstract

Many simulation data sets are so massive that they must be distributed among disk farms attached to different computing nodes. The data is partitioned into spatially disjoint sets that are not easily transferable among nodes due to bandwidth limitations. Conventional machine learning methods are not designed for this type of data distribution. Experts mark a training data set with different levels of saliency emphasizing speed rather than accuracy due to the size of the task. The challenge is to develop machine learning methods that learn how the expert has marked the training data so that similar test data sets can be marked more efficiently.

Ensembles of machine learning classifiers are typically more accurate than individual classifiers. An ensemble of machine learning classifiers requires substantially less memory than the corresponding partition of the data set. This allows the transfer of ensembles among partitions. If all the ensembles are sent to each partition, they can vote for a level of saliency for each example in the partition. Different partitions of the data set may not have any salient points, especially if the data set has a time step dimension. This means the learned classifier for such partitions can not vote for saliency since they have not been trained to recognize it.

In this work, we investigate the performance of different ensembles of classifiers on spatially partitioned data sets. Success is measured by the correct recognition of unknown and salient regions of data points.

Scholar Commons Citation

Shoemaker, Larry, "Ensembles for Distributed Data" (2005). USF Tampa Graduate Theses and Dissertations.
https://digitalcommons.usf.edu/etd/862

Download

Included in

American Studies Commons

COinS

USF Tampa Graduate Theses and Dissertations

Ensembles for Distributed Data

Graduation Year

Document Type

Degree

Degree Granting Department

Major Professor

Committee Member

Committee Member

Keywords

Abstract

Scholar Commons Citation

Included in

Search

Browse By

Useful Links

USF Tampa Graduate Theses and Dissertations

Ensembles for Distributed Data

Author

Graduation Year

Document Type

Degree

Degree Granting Department

Major Professor

Committee Member

Committee Member

Keywords

Abstract

Scholar Commons Citation

Included in

Share

Search

Browse By

Useful Links