Reconciling modern machine learning practice and the bias-variance trade-off

Reconciling modern machine learning practice and the bias-variance trade-off

September 12, 2019 | Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal
The paper "Reconciling Modern Machine Learning Practice and the Bias-Variance Trade-Off" by Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal explores the tension between the classical bias-variance trade-off and the observed behavior of modern machine learning methods, particularly neural networks. The authors introduce the concept of a "double descent" risk curve, which subsumes the traditional U-shaped bias-variance trade-off and extends it beyond the point of interpolation. This curve shows that as the model capacity increases beyond the point where the model perfectly fits the training data (interpolation), performance can improve, leading to a second descent in test risk. The paper provides empirical evidence for the double descent phenomenon across various models and datasets, including Random Fourier Features (RFF), Random ReLU features, and fully connected neural networks. It also discusses the inductive biases that underlie this behavior, such as the preference for smoother functions, and the practical implications for model selection and optimization. The authors argue that the double descent curve highlights the limits of classical analyses and opens new avenues for research in machine learning theory and practice.The paper "Reconciling Modern Machine Learning Practice and the Bias-Variance Trade-Off" by Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal explores the tension between the classical bias-variance trade-off and the observed behavior of modern machine learning methods, particularly neural networks. The authors introduce the concept of a "double descent" risk curve, which subsumes the traditional U-shaped bias-variance trade-off and extends it beyond the point of interpolation. This curve shows that as the model capacity increases beyond the point where the model perfectly fits the training data (interpolation), performance can improve, leading to a second descent in test risk. The paper provides empirical evidence for the double descent phenomenon across various models and datasets, including Random Fourier Features (RFF), Random ReLU features, and fully connected neural networks. It also discusses the inductive biases that underlie this behavior, such as the preference for smoother functions, and the practical implications for model selection and optimization. The authors argue that the double descent curve highlights the limits of classical analyses and opens new avenues for research in machine learning theory and practice.
Reach us at info@study.space
[slides] Reconciling modern machine-learning practice and the classical bias%E2%80%93variance trade-off | StudySpace