[slides and audio] SGDR%3A Stochastic Gradient Descent with Warm Restarts

The paper introduces a novel warm restart technique for Stochastic Gradient Descent (SGD) to improve the anytime performance of training deep neural networks (DNNs). The technique, called Stochastic Gradient Descent with Warm Restarts (SGDR), involves periodically simulating warm restarts of SGD, where the learning rate is increased and then decayed using a cosine annealing schedule. This approach is designed to enhance the convergence rate and reduce the number of epochs required to achieve competitive or better results compared to traditional learning rate schedules. Empirical studies on the CIFAR-10 and CIFAR-100 datasets demonstrate that SGDR can achieve state-of-the-art results with significantly fewer epochs, achieving 3.14% and 16.21% test errors, respectively. The authors also show that combining models obtained before restarts in an ensemble further improves performance. Additionally, SGDR is tested on an EEG recording dataset and a downsampled version of the ImageNet dataset, confirming its effectiveness across different domains. The source code for SGDR is available at <https://github.com/loshchilov/SGDR>.The paper introduces a novel warm restart technique for Stochastic Gradient Descent (SGD) to improve the anytime performance of training deep neural networks (DNNs). The technique, called Stochastic Gradient Descent with Warm Restarts (SGDR), involves periodically simulating warm restarts of SGD, where the learning rate is increased and then decayed using a cosine annealing schedule. This approach is designed to enhance the convergence rate and reduce the number of epochs required to achieve competitive or better results compared to traditional learning rate schedules. Empirical studies on the CIFAR-10 and CIFAR-100 datasets demonstrate that SGDR can achieve state-of-the-art results with significantly fewer epochs, achieving 3.14% and 16.21% test errors, respectively. The authors also show that combining models obtained before restarts in an ensemble further improves performance. Additionally, SGDR is tested on an EEG recording dataset and a downsampled version of the ImageNet dataset, confirming its effectiveness across different domains. The source code for SGDR is available at <https://github.com/loshchilov/SGDR>.

SGDR: Stochastic Gradient Descent with Warm Restarts

3 May 2017 | Ilya Loshchilov & Frank Hutter