[slides and audio] Understanding deep learning (still) requires rethinking generalization

The paper "Understanding Deep Learning (Still) Requires Rethinking Generalization" by Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals explores the phenomenon of deep neural networks achieving small generalization errors despite their large size. The authors conduct extensive experiments to challenge traditional explanations of generalization, such as model complexity and regularization techniques. They find that state-of-the-art convolutional networks can easily fit random labels and unstructured noise, indicating that these networks have sufficient capacity to memorize the training data. This finding contradicts the notion that explicit regularization is necessary for generalization. The authors also provide a theoretical construction showing that simple depth-two neural networks can express any labeling of the training data when the number of parameters exceeds the number of data points. They conclude that the effective capacity of neural networks is large enough to shatter the training data, posing a challenge to traditional statistical learning theory. The paper highlights the need for new theoretical frameworks to understand the generalization of deep learning models.The paper "Understanding Deep Learning (Still) Requires Rethinking Generalization" by Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals explores the phenomenon of deep neural networks achieving small generalization errors despite their large size. The authors conduct extensive experiments to challenge traditional explanations of generalization, such as model complexity and regularization techniques. They find that state-of-the-art convolutional networks can easily fit random labels and unstructured noise, indicating that these networks have sufficient capacity to memorize the training data. This finding contradicts the notion that explicit regularization is necessary for generalization. The authors also provide a theoretical construction showing that simple depth-two neural networks can express any labeling of the training data when the number of parameters exceeds the number of data points. They conclude that the effective capacity of neural networks is large enough to shatter the training data, posing a challenge to traditional statistical learning theory. The paper highlights the need for new theoretical frameworks to understand the generalization of deep learning models.

Understanding Deep Learning (Still) Requires Rethinking Generalization

MARCH 2021 | VOL. 64 | NO. 3 | By Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals