Deep Networks Always Grok and Here is Why

Deep Networks Always Grok and Here is Why

2024 | Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk
Deep neural networks (DNNs) exhibit a phenomenon called "grokking," where generalization occurs long after achieving near-zero training error. This phenomenon has been observed in various settings, including CNNs trained on CIFAR10 and ResNets on Imagenette. The paper introduces the concept of "delayed robustness," where DNNs become robust to adversarial examples after interpolation and generalization. The authors propose a new measure of local complexity, which reflects the density of linear regions in the DNN's input space and serves as a progress measure during training. They show that local complexity undergoes a phase transition during training, where linear regions migrate away from training samples and towards the decision boundary, leading to grokking. The paper demonstrates that grokking occurs due to the linearization of the DNN's function around training points, resulting in a robust partition of the input space. The authors also show that grokking is not limited to specific tasks or initializations and occurs in a wide range of practical settings. They provide empirical evidence that grokking is a common phenomenon in deep learning and that the local complexity dynamics are directly tied to the emergence of delayed generalization and robustness. The paper also explores the relationship between grokking and various factors, including network architecture, training data, and activation functions. The authors conclude that grokking is a fundamental aspect of DNN training and that further research is needed to understand the underlying mechanisms.Deep neural networks (DNNs) exhibit a phenomenon called "grokking," where generalization occurs long after achieving near-zero training error. This phenomenon has been observed in various settings, including CNNs trained on CIFAR10 and ResNets on Imagenette. The paper introduces the concept of "delayed robustness," where DNNs become robust to adversarial examples after interpolation and generalization. The authors propose a new measure of local complexity, which reflects the density of linear regions in the DNN's input space and serves as a progress measure during training. They show that local complexity undergoes a phase transition during training, where linear regions migrate away from training samples and towards the decision boundary, leading to grokking. The paper demonstrates that grokking occurs due to the linearization of the DNN's function around training points, resulting in a robust partition of the input space. The authors also show that grokking is not limited to specific tasks or initializations and occurs in a wide range of practical settings. They provide empirical evidence that grokking is a common phenomenon in deep learning and that the local complexity dynamics are directly tied to the emergence of delayed generalization and robustness. The paper also explores the relationship between grokking and various factors, including network architecture, training data, and activation functions. The authors conclude that grokking is a fundamental aspect of DNN training and that further research is needed to understand the underlying mechanisms.
Reach us at info@study.space
Understanding Deep Networks Always Grok and Here is Why