MARCH 2021 | Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals
Deep learning models, despite their large size, often generalize well, which challenges traditional theories of generalization. This paper shows that state-of-the-art neural networks can fit random labels of training data, indicating that their generalization is not due to model complexity or regularization. Experiments demonstrate that even with random labels, neural networks can achieve zero training error, suggesting that their capacity is sufficient to memorize data. This challenges traditional generalization bounds like uniform convergence and algorithmic stability. The paper also shows that regularization is not essential for generalization, as models can still generalize well without it. Implicit regularization, such as the effect of stochastic gradient descent, may play a role in generalization. Theoretical results show that simple neural networks can express any labeling of the training data, highlighting their expressive power. The paper also discusses the role of implicit regularization in linear models, showing that SGD tends to find solutions with minimal norm. These findings suggest that generalization in deep learning is not fully explained by traditional theories and that further research is needed to understand the underlying mechanisms. The paper also highlights the importance of empirical studies in understanding deep learning, as theoretical approaches often fail to capture the complexity of real-world data.Deep learning models, despite their large size, often generalize well, which challenges traditional theories of generalization. This paper shows that state-of-the-art neural networks can fit random labels of training data, indicating that their generalization is not due to model complexity or regularization. Experiments demonstrate that even with random labels, neural networks can achieve zero training error, suggesting that their capacity is sufficient to memorize data. This challenges traditional generalization bounds like uniform convergence and algorithmic stability. The paper also shows that regularization is not essential for generalization, as models can still generalize well without it. Implicit regularization, such as the effect of stochastic gradient descent, may play a role in generalization. Theoretical results show that simple neural networks can express any labeling of the training data, highlighting their expressive power. The paper also discusses the role of implicit regularization in linear models, showing that SGD tends to find solutions with minimal norm. These findings suggest that generalization in deep learning is not fully explained by traditional theories and that further research is needed to understand the underlying mechanisms. The paper also highlights the importance of empirical studies in understanding deep learning, as theoretical approaches often fail to capture the complexity of real-world data.