[slides and audio] A Closer Look at Memorization in Deep Networks

This paper examines the role of memorization in deep learning, particularly in the context of generalization and adversarial robustness. While deep networks can memorize noisy data, the study suggests that they prioritize learning simple patterns first. Experiments reveal qualitative differences in the optimization behavior of deep neural networks (DNNs) on real vs. noisy data, indicating that DNNs are content-aware and take advantage of shared patterns. Regularization techniques, such as dropout, can hinder memorization while preserving generalization on real data. The paper also investigates the impact of capacity and effective capacity on learning, finding that higher capacity models are needed for noisy datasets but not for real data. Additionally, the study introduces the concept of critical samples to measure the complexity of learned hypotheses, showing that DNNs learn patterns before memorizing noise. Finally, the paper demonstrates that explicit regularizers can control the speed of memorization without significantly impacting generalization. The findings suggest that memorization and generalization in DNNs depend on both the network architecture, optimization procedure, and the training data itself.This paper examines the role of memorization in deep learning, particularly in the context of generalization and adversarial robustness. While deep networks can memorize noisy data, the study suggests that they prioritize learning simple patterns first. Experiments reveal qualitative differences in the optimization behavior of deep neural networks (DNNs) on real vs. noisy data, indicating that DNNs are content-aware and take advantage of shared patterns. Regularization techniques, such as dropout, can hinder memorization while preserving generalization on real data. The paper also investigates the impact of capacity and effective capacity on learning, finding that higher capacity models are needed for noisy datasets but not for real data. Additionally, the study introduces the concept of critical samples to measure the complexity of learned hypotheses, showing that DNNs learn patterns before memorizing noise. Finally, the paper demonstrates that explicit regularizers can control the speed of memorization without significantly impacting generalization. The findings suggest that memorization and generalization in DNNs depend on both the network architecture, optimization procedure, and the training data itself.

A Closer Look at Memorization in Deep Networks

1 Jul 2017 | Devansh Arpit * 1 2 Stanisław Jastrzębski * 3 Nicolas Ballas * 1 2 David Krueger * 1 2 Emmanuel Bengio 4 Maxinder S. Kanwal 5 Tegan Maharaj 1 6 Asja Fischer 7 Aaron Courville 1 2 8 Yoshua Bengio 1 2 9 Simon Lacoste-Julien 1 2