Is Error-Based Pruning Redeemable?

Document Type

Article

Publication Date

2003

Keywords

Decision tree, pruning, error based pruning, reduced error pruning

Digital Object Identifier (DOI)

https://doi.org/10.1142/S0218213003001228

Abstract

Error based pruning can be used to prune a decision tree and it does not require the use of validation data. It is implemented in the widely used C4.5 decision tree software. It uses a parameter, the certainty factor, that affects the size of the pruned tree. Several researchers have compared error based pruning with other approaches, and have shown results that suggest that error based pruning results in larger trees that give no increase in accuracy. They further suggest that as more data is added to the training set, the tree size after applying error based pruning continues to grow even though there is no increase in accuracy. It appears that these results were obtained with the default certainty factor value. Here, we show that varying the certainty factor allows significantly smaller trees to be obtained with minimal or no accuracy loss. Also, the growth of tree size with added data can be halted with an appropriate choice of certainty factor. Methods of determining the certainty factor are discussed for both small and large data sets. Experimental results support the conclusion that error based pruning can be used to produce appropriately sized trees with good accuracy when compared with reduced error pruning.

Was this content written or created while at USF?

Yes

Citation / Publisher Attribution

International Journal on Artificial Intelligence Tools, v. 12, issue 3, p. 249-264

Share

COinS