Measuring classifier performance: a coherent alternative to the area under the ROC curve

Measuring classifier performance: a coherent alternative to the area under the ROC curve

2009 | David J. Hand
The area under the ROC curve (AUC) is a widely used measure of classifier performance. However, it has a fundamental incoherence: it uses different misclassification cost distributions for different classifiers, leading to inconsistent evaluations. This is problematic because the relative severity of misclassifications is a property of the problem, not the classifier. The AUC is equivalent to using different metrics for different classifiers, which is nonsensical. A valid alternative is proposed, the H measure, which avoids this incoherence by using a fixed weight distribution over cost ratios. The H measure is defined as 1 minus the ratio of the expected minimum loss to the maximum possible loss. It is shown that the AUC and H measure are not monotonically related, so a classifier may appear better under one measure and worse under the other. The H measure is estimated separately and is more appropriate for comparing classifiers in real-world scenarios where misclassification costs vary. The paper concludes that the AUC is not a reliable measure of classifier performance due to its incoherence, and the H measure provides a more coherent alternative.The area under the ROC curve (AUC) is a widely used measure of classifier performance. However, it has a fundamental incoherence: it uses different misclassification cost distributions for different classifiers, leading to inconsistent evaluations. This is problematic because the relative severity of misclassifications is a property of the problem, not the classifier. The AUC is equivalent to using different metrics for different classifiers, which is nonsensical. A valid alternative is proposed, the H measure, which avoids this incoherence by using a fixed weight distribution over cost ratios. The H measure is defined as 1 minus the ratio of the expected minimum loss to the maximum possible loss. It is shown that the AUC and H measure are not monotonically related, so a classifier may appear better under one measure and worse under the other. The H measure is estimated separately and is more appropriate for comparing classifiers in real-world scenarios where misclassification costs vary. The paper concludes that the AUC is not a reliable measure of classifier performance due to its incoherence, and the H measure provides a more coherent alternative.
Reach us at info@study.space