Graduation Year

2019

Document Type

Thesis

Degree

M.A.

Degree Name

Master of Arts (M.A.)

Degree Granting Department

Mathematics and Statistics

Major Professor

Lu Lu, Ph.D.

Committee Member

Mingyang Li, Ph.D.

Committee Member

Seung-Yeop Lee, Ph.D.

Keywords

Bootstrapping, Classification, Decision Trees, Machine Learning, Statistics

Abstract

Ensemble methods are commonly used for building predictive models for classification. Models that are unstable to perturbations in the training set, such as the decision tree, often see considerable reductions in error when grouped, using bootstrapped resamples of the training data to train many models. The non-parametric bootstrap, however, has limited efficacy when used on severely imbalanced data, especially when the number of observations of one or more classes is exceptionally small. We explore the fractional random weighted bootstrap, which randomly assigns fractional weights to observations, as an alternative resampling pro cedure in training machine learning ensembles, particularly decision tree ensembles. We carry out a methodological study comparing the standard bagging and random forest ensemble models for decision trees against their fractionally random weighted alternatives, finding some evidence supporting their use on data with severe imbal ance.

Share

COinS