2024-06-05 | Jaerin Lee, Bong Gyun Kang, Kihoon Kim, Kyoung Mu Lee
Grokfast is an algorithm that accelerates the grokking phenomenon in machine learning, where a model achieves generalization long after overfitting to the training data. The key idea is to amplify the slow-varying components of gradients to speed up the generalization process. By treating the parameter updates as a random signal, the algorithm spectrally decomposes the gradient dynamics into fast-varying and slow-varying components. The slow-varying component is responsible for generalization, and amplifying it accelerates the grokking phenomenon. The algorithm uses low-pass filters to amplify the slow gradients, which are then applied to the optimizer. Two variants of the algorithm are presented: GROKFAST-MA and GROKFAST-EMA. The MA variant uses a moving average filter, while the EMA variant uses an exponential moving average filter. Both variants significantly reduce the number of training iterations required to achieve generalization. The algorithm is tested on various tasks, including image, language, and graph data, and shows significant improvements in generalization speed. The results demonstrate that GROKFAST can accelerate the grokking phenomenon by up to 50 times, making it a practical tool for machine learning practitioners. The algorithm is implemented in Python and is available on GitHub.Grokfast is an algorithm that accelerates the grokking phenomenon in machine learning, where a model achieves generalization long after overfitting to the training data. The key idea is to amplify the slow-varying components of gradients to speed up the generalization process. By treating the parameter updates as a random signal, the algorithm spectrally decomposes the gradient dynamics into fast-varying and slow-varying components. The slow-varying component is responsible for generalization, and amplifying it accelerates the grokking phenomenon. The algorithm uses low-pass filters to amplify the slow gradients, which are then applied to the optimizer. Two variants of the algorithm are presented: GROKFAST-MA and GROKFAST-EMA. The MA variant uses a moving average filter, while the EMA variant uses an exponential moving average filter. Both variants significantly reduce the number of training iterations required to achieve generalization. The algorithm is tested on various tasks, including image, language, and graph data, and shows significant improvements in generalization speed. The results demonstrate that GROKFAST can accelerate the grokking phenomenon by up to 50 times, making it a practical tool for machine learning practitioners. The algorithm is implemented in Python and is available on GitHub.