Exploring Generalization in Deep Learning

Exploring Generalization in Deep Learning

6 Jul 2017 | Behnam Neyshabur, Srinadh Bhojanapalli, David McAllester, Nathan Srebro
This paper explores the factors that drive generalization in deep learning. It examines several proposed explanations, including norm-based control, sharpness, and robustness. The study highlights the importance of scale normalization and connects sharpness to PAC-Bayes theory. It investigates how well these measures explain observed generalization phenomena. Deep neural networks achieve good generalization despite having more parameters than training data. However, simply minimizing training error is not sufficient for good generalization. The choice of optimization algorithm significantly affects the generalization behavior of the learned model. For example, Path-SGD, an optimization algorithm invariant to weight rescaling, shows better generalization than standard SGD. The paper discusses various complexity measures for neural networks, including norms, margins, and sharpness. It emphasizes the importance of relating parameter scale to output scale. Sharpness alone is not sufficient for generalization, but combining it with norm-based measures through PAC-Bayes analysis provides a more effective complexity measure. The paper evaluates different complexity measures based on their theoretical ability to guarantee generalization and their empirical ability to explain observed phenomena. It finds that norms and margins can explain generalization, while sharpness alone is not sufficient. The paper also discusses the role of Lipschitz continuity and robustness in generalization. The study shows that the generalization behavior of models trained on true labels is better than those trained on random labels. It also demonstrates that increasing the number of hidden units can lead to better generalization even without reducing training error. The paper concludes that a combination of sharpness and norms provides a more effective measure of complexity for explaining generalization in deep learning.This paper explores the factors that drive generalization in deep learning. It examines several proposed explanations, including norm-based control, sharpness, and robustness. The study highlights the importance of scale normalization and connects sharpness to PAC-Bayes theory. It investigates how well these measures explain observed generalization phenomena. Deep neural networks achieve good generalization despite having more parameters than training data. However, simply minimizing training error is not sufficient for good generalization. The choice of optimization algorithm significantly affects the generalization behavior of the learned model. For example, Path-SGD, an optimization algorithm invariant to weight rescaling, shows better generalization than standard SGD. The paper discusses various complexity measures for neural networks, including norms, margins, and sharpness. It emphasizes the importance of relating parameter scale to output scale. Sharpness alone is not sufficient for generalization, but combining it with norm-based measures through PAC-Bayes analysis provides a more effective complexity measure. The paper evaluates different complexity measures based on their theoretical ability to guarantee generalization and their empirical ability to explain observed phenomena. It finds that norms and margins can explain generalization, while sharpness alone is not sufficient. The paper also discusses the role of Lipschitz continuity and robustness in generalization. The study shows that the generalization behavior of models trained on true labels is better than those trained on random labels. It also demonstrates that increasing the number of hidden units can lead to better generalization even without reducing training error. The paper concludes that a combination of sharpness and norms provides a more effective measure of complexity for explaining generalization in deep learning.
Reach us at info@study.space
[slides and audio] Exploring Generalization in Deep Learning