UNDERSTANDING DEEP LEARNING REQUIRES RE-THINKING GENERALIZATION

UNDERSTANDING DEEP LEARNING REQUIRES RE-THINKING GENERALIZATION

26 Feb 2017 | Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals
Deep learning models often have more parameters than training samples, yet they generalize well. This paper challenges traditional views on generalization, showing that deep neural networks can easily fit random labels, indicating their capacity to memorize data. Experiments show that even with random labels or noise, networks achieve zero training error, but test error increases with noise. This suggests that neural networks can capture underlying patterns while fitting noise. Traditional measures like VC-dimension and Rademacher complexity fail to explain this behavior, as they do not account for the ability to fit random labels. Explicit regularization, such as weight decay and dropout, does not sufficiently explain generalization, as networks can still generalize well without them. Implicit regularization, like batch normalization and early stopping, also plays a role, but is not the sole factor. Theoretical results show that two-layer neural networks can represent any labeling of the training data, indicating their strong expressivity. The paper argues that the effective capacity of neural networks is sufficient to memorize data, and that generalization is not solely due to regularization. Instead, it may be related to the ability to capture underlying patterns in the data. The role of implicit regularization is highlighted, as algorithms like stochastic gradient descent can implicitly regularize solutions. However, the fundamental reason for generalization remains unclear. The findings challenge traditional approaches to understanding generalization and suggest that further research is needed to understand the properties that enable neural networks to generalize well.Deep learning models often have more parameters than training samples, yet they generalize well. This paper challenges traditional views on generalization, showing that deep neural networks can easily fit random labels, indicating their capacity to memorize data. Experiments show that even with random labels or noise, networks achieve zero training error, but test error increases with noise. This suggests that neural networks can capture underlying patterns while fitting noise. Traditional measures like VC-dimension and Rademacher complexity fail to explain this behavior, as they do not account for the ability to fit random labels. Explicit regularization, such as weight decay and dropout, does not sufficiently explain generalization, as networks can still generalize well without them. Implicit regularization, like batch normalization and early stopping, also plays a role, but is not the sole factor. Theoretical results show that two-layer neural networks can represent any labeling of the training data, indicating their strong expressivity. The paper argues that the effective capacity of neural networks is sufficient to memorize data, and that generalization is not solely due to regularization. Instead, it may be related to the ability to capture underlying patterns in the data. The role of implicit regularization is highlighted, as algorithms like stochastic gradient descent can implicitly regularize solutions. However, the fundamental reason for generalization remains unclear. The findings challenge traditional approaches to understanding generalization and suggest that further research is needed to understand the properties that enable neural networks to generalize well.
Reach us at info@study.space