Understanding Understanding deep learning requires rethinking generalization

The paper "Understanding Deep Learning Requires Rethinking Generalization" by Chiyuan Zhang et al. challenges the conventional wisdom that attributes small generalization errors in deep neural networks to either the properties of the model family or the regularization techniques used during training. Through extensive experiments, the authors demonstrate that state-of-the-art convolutional networks for image classification can easily fit random labels, even when explicit regularization is applied. This phenomenon is observed across various standard architectures trained on datasets like CIFAR10 and ImageNet, and it holds even when the true images are replaced by random noise. The authors interpret these findings by showing that simple depth two neural networks can already achieve perfect finite sample expressivity as soon as the number of parameters exceeds the number of data points, which is typically the case in practice. They also discuss the role of explicit and implicit regularization, concluding that while explicit regularization can help improve generalization, it is neither necessary nor sufficient for controlling generalization error. The paper further explores the finite-sample expressivity of neural networks and provides a theoretical construction to support these findings. Overall, the authors argue that traditional measures of model complexity struggle to explain the generalization ability of large neural networks, highlighting the need for a more precise formal measure to understand these models.The paper "Understanding Deep Learning Requires Rethinking Generalization" by Chiyuan Zhang et al. challenges the conventional wisdom that attributes small generalization errors in deep neural networks to either the properties of the model family or the regularization techniques used during training. Through extensive experiments, the authors demonstrate that state-of-the-art convolutional networks for image classification can easily fit random labels, even when explicit regularization is applied. This phenomenon is observed across various standard architectures trained on datasets like CIFAR10 and ImageNet, and it holds even when the true images are replaced by random noise. The authors interpret these findings by showing that simple depth two neural networks can already achieve perfect finite sample expressivity as soon as the number of parameters exceeds the number of data points, which is typically the case in practice. They also discuss the role of explicit and implicit regularization, concluding that while explicit regularization can help improve generalization, it is neither necessary nor sufficient for controlling generalization error. The paper further explores the finite-sample expressivity of neural networks and provides a theoretical construction to support these findings. Overall, the authors argue that traditional measures of model complexity struggle to explain the generalization ability of large neural networks, highlighting the need for a more precise formal measure to understand these models.

UNDERSTANDING DEEP LEARNING REQUIRES RE-THINKING GENERALIZATION

26 Feb 2017 | Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals