[slides and audio] Deep double descent%3A where bigger models and more data hurt

The paper "Deep Double Descent: Where Bigger Models and More Data Hurt" by Preetum Nakkiran explores the phenomenon of "double descent" in deep learning, where performance initially worsens as model size increases, then improves. The authors unify this phenomenon by introducing the concept of *effective model complexity* (EMC), which depends on the training procedure, data distribution, and classifier architecture. They hypothesize that double descent occurs as a function of EMC, leading to "epoch-wise double descent" and "sample-wise non-monotonicity." Specifically, they find that increasing training time can increase EMC, and thus, for critically parameterized models, more data can sometimes hurt test performance. The paper provides extensive empirical evidence for these hypotheses across various datasets, architectures, and optimization methods, challenging conventional wisdom in deep learning and offering new insights into the interaction between model size, training procedures, and test performance.The paper "Deep Double Descent: Where Bigger Models and More Data Hurt" by Preetum Nakkiran explores the phenomenon of "double descent" in deep learning, where performance initially worsens as model size increases, then improves. The authors unify this phenomenon by introducing the concept of *effective model complexity* (EMC), which depends on the training procedure, data distribution, and classifier architecture. They hypothesize that double descent occurs as a function of EMC, leading to "epoch-wise double descent" and "sample-wise non-monotonicity." Specifically, they find that increasing training time can increase EMC, and thus, for critically parameterized models, more data can sometimes hurt test performance. The paper provides extensive empirical evidence for these hypotheses across various datasets, architectures, and optimization methods, challenging conventional wisdom in deep learning and offering new insights into the interaction between model size, training procedures, and test performance.

DEEP DOUBLE DESCENT: WHERE BIGGER MODELS AND MORE DATA HURT

4 Dec 2019 | Preetum Nakkiran*, Gal Kaplun†, Yamini Bansal†, Tristan Yang, Boaz Barak, Ilya Sutskever