The Foundations of Cost-Sensitive Learning

The Foundations of Cost-Sensitive Learning

To appear | Charles Elkan
This paper revisits the problem of optimal learning and decision-making when different misclassification errors incur different penalties. It characterizes when a cost matrix is reasonable and shows how to avoid economically incoherent cost matrices. For the two-class case, it proves a theorem showing how to adjust the proportion of negative examples in a training set to make optimal cost-sensitive classification decisions using a standard non-cost-sensitive classifier. However, it argues that changing the balance of negative and positive examples has little effect on classifiers produced by standard Bayesian and decision tree learning methods. Therefore, the recommended approach is to learn a classifier from the training set as given and then compute optimal decisions explicitly using probability estimates. The paper discusses cost matrices, their properties, and the difference between costs and benefits. It highlights that cost matrices should satisfy "reasonableness" conditions where the cost of labeling an example incorrectly is greater than the cost of labeling it correctly. It also shows that any two-class cost matrix can be transformed into a simpler matrix that leads to the same decisions. The paper then explores how to achieve cost-sensitivity by rebalancing the training set. It presents a theorem that provides the formula for adjusting the number of negative examples in the training set to achieve a desired probability threshold. The theorem is proven and discussed in detail. The paper also discusses the effects of changing base rates on probability estimates and presents a theorem that shows how predicted class membership probabilities should change in response to a change in base rates. This theorem is used to prove the earlier theorem on rebalancing. The paper then examines the effects of changing the proportion of positive and negative examples on standard learning algorithms. It shows that changing the base rate has little effect on the structure of decision trees when using certain impurity criteria. It also discusses the effects of pruning decision trees and shows that Laplace pruning is equivalent to no pruning when using probability smoothing. Finally, the paper concludes that the recommended approach for using Bayesian and decision tree learning methods in domains with differing misclassification costs is to learn a classifier from the training set as given and then compute optimal decisions explicitly using probability estimates.This paper revisits the problem of optimal learning and decision-making when different misclassification errors incur different penalties. It characterizes when a cost matrix is reasonable and shows how to avoid economically incoherent cost matrices. For the two-class case, it proves a theorem showing how to adjust the proportion of negative examples in a training set to make optimal cost-sensitive classification decisions using a standard non-cost-sensitive classifier. However, it argues that changing the balance of negative and positive examples has little effect on classifiers produced by standard Bayesian and decision tree learning methods. Therefore, the recommended approach is to learn a classifier from the training set as given and then compute optimal decisions explicitly using probability estimates. The paper discusses cost matrices, their properties, and the difference between costs and benefits. It highlights that cost matrices should satisfy "reasonableness" conditions where the cost of labeling an example incorrectly is greater than the cost of labeling it correctly. It also shows that any two-class cost matrix can be transformed into a simpler matrix that leads to the same decisions. The paper then explores how to achieve cost-sensitivity by rebalancing the training set. It presents a theorem that provides the formula for adjusting the number of negative examples in the training set to achieve a desired probability threshold. The theorem is proven and discussed in detail. The paper also discusses the effects of changing base rates on probability estimates and presents a theorem that shows how predicted class membership probabilities should change in response to a change in base rates. This theorem is used to prove the earlier theorem on rebalancing. The paper then examines the effects of changing the proportion of positive and negative examples on standard learning algorithms. It shows that changing the base rate has little effect on the structure of decision trees when using certain impurity criteria. It also discusses the effects of pruning decision trees and shows that Laplace pruning is equivalent to no pruning when using probability smoothing. Finally, the paper concludes that the recommended approach for using Bayesian and decision tree learning methods in domains with differing misclassification costs is to learn a classifier from the training set as given and then compute optimal decisions explicitly using probability estimates.
Reach us at info@study.space
[slides and audio] The Foundations of Cost-Sensitive Learning