November 4, 2003 | Peter L. Bartlett, Michael I. Jordan, Jon D. McAuliffe
This paper explores the relationship between the 0-1 loss function and nonnegative surrogate loss functions in the context of multivariate classification. The authors focus on convex optimization methods, such as support vector machines and boosting, which minimize a convex surrogate of the 0-1 loss. They provide a general quantitative relationship between the risk assessed using the 0-1 loss and the risk assessed using any nonnegative surrogate loss function. This relationship allows for nontrivial upper bounds on excess risk under minimal conditions on the loss function, specifically a pointwise form of Fisher consistency for classification. The paper also discusses the statistical consequences of using surrogates, including regularizing effects and convergence rates under low noise assumptions. The authors present a refined version of their main result for low noise settings and apply their findings to the estimation of convergence rates in function classes. The paper concludes with a discussion of the implications of convexity in classification methods and the conditions under which classification-calibration holds for different types of loss functions.This paper explores the relationship between the 0-1 loss function and nonnegative surrogate loss functions in the context of multivariate classification. The authors focus on convex optimization methods, such as support vector machines and boosting, which minimize a convex surrogate of the 0-1 loss. They provide a general quantitative relationship between the risk assessed using the 0-1 loss and the risk assessed using any nonnegative surrogate loss function. This relationship allows for nontrivial upper bounds on excess risk under minimal conditions on the loss function, specifically a pointwise form of Fisher consistency for classification. The paper also discusses the statistical consequences of using surrogates, including regularizing effects and convergence rates under low noise assumptions. The authors present a refined version of their main result for low noise settings and apply their findings to the estimation of convergence rates in function classes. The paper concludes with a discussion of the implications of convexity in classification methods and the conditions under which classification-calibration holds for different types of loss functions.