Reconciling modern machine learning practice and the bias-variance trade-off

Reconciling modern machine learning practice and the bias-variance trade-off

September 12, 2019 | Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal
This paper addresses the apparent contradiction between the classical bias-variance trade-off in machine learning and the observed behavior of modern machine learning methods, which often use very rich models like neural networks that interpolate the training data. The classical trade-off suggests that models should balance under-fitting and over-fitting, but modern models, despite being highly complex and fitting the training data perfectly, often achieve high accuracy on test data. The authors propose a unified framework that reconciles these perspectives through the concept of "double descent," a risk curve that extends the classical U-shaped bias-variance curve by showing that increasing model capacity beyond the interpolation threshold can lead to improved performance. The double descent curve is demonstrated across various models and datasets, including neural networks, random Fourier features, and random forests. It shows that as model capacity increases, performance initially worsens (the first descent), then improves again (the second descent), indicating that very large models can generalize well despite their high capacity. This phenomenon is attributed to the inductive bias of the model, where larger models can find simpler, smoother solutions that generalize better. The paper also discusses the implications of the double descent curve for both theory and practice in machine learning. It highlights that the classical understanding of model performance is limited, and the double descent curve provides a more comprehensive view of how model capacity affects generalization. The findings suggest that modern machine learning methods, while seemingly overfitting, can achieve good generalization due to their ability to find simpler, more regular solutions. This has important implications for the design and training of machine learning models, emphasizing the role of inductive bias and the importance of understanding the underlying mechanisms that drive generalization.This paper addresses the apparent contradiction between the classical bias-variance trade-off in machine learning and the observed behavior of modern machine learning methods, which often use very rich models like neural networks that interpolate the training data. The classical trade-off suggests that models should balance under-fitting and over-fitting, but modern models, despite being highly complex and fitting the training data perfectly, often achieve high accuracy on test data. The authors propose a unified framework that reconciles these perspectives through the concept of "double descent," a risk curve that extends the classical U-shaped bias-variance curve by showing that increasing model capacity beyond the interpolation threshold can lead to improved performance. The double descent curve is demonstrated across various models and datasets, including neural networks, random Fourier features, and random forests. It shows that as model capacity increases, performance initially worsens (the first descent), then improves again (the second descent), indicating that very large models can generalize well despite their high capacity. This phenomenon is attributed to the inductive bias of the model, where larger models can find simpler, smoother solutions that generalize better. The paper also discusses the implications of the double descent curve for both theory and practice in machine learning. It highlights that the classical understanding of model performance is limited, and the double descent curve provides a more comprehensive view of how model capacity affects generalization. The findings suggest that modern machine learning methods, while seemingly overfitting, can achieve good generalization due to their ability to find simpler, more regular solutions. This has important implications for the design and training of machine learning models, emphasizing the role of inductive bias and the importance of understanding the underlying mechanisms that drive generalization.
Reach us at info@study.space