Understanding Averaging Weights Leads to Wider Optima and Better Generalization

The paper introduces Stochastic Weight Averaging (SWA), a method that improves the generalization of deep neural networks by averaging multiple points along the trajectory of Stochastic Gradient Descent (SGD) with a cyclical or constant learning rate. SWA is shown to find flatter solutions than conventional SGD, leading to better test accuracy on various datasets and architectures, including ResNet, PyramidNet, DenseNet, and Shake-Shake networks. The method is easy to implement, has minimal computational overhead, and can be used as a drop-in replacement for standard SGD training. The paper also discusses the connection between SWA and Fast Geometric Ensembling (FGE), demonstrating that SWA can approximate FGE ensembles with a single model. Experimental results on CIFAR-10, CIFAR-100, and ImageNet show significant improvements in test accuracy compared to conventional SGD training.The paper introduces Stochastic Weight Averaging (SWA), a method that improves the generalization of deep neural networks by averaging multiple points along the trajectory of Stochastic Gradient Descent (SGD) with a cyclical or constant learning rate. SWA is shown to find flatter solutions than conventional SGD, leading to better test accuracy on various datasets and architectures, including ResNet, PyramidNet, DenseNet, and Shake-Shake networks. The method is easy to implement, has minimal computational overhead, and can be used as a drop-in replacement for standard SGD training. The paper also discusses the connection between SWA and Fast Geometric Ensembling (FGE), demonstrating that SWA can approximate FGE ensembles with a single model. Experimental results on CIFAR-10, CIFAR-100, and ImageNet show significant improvements in test accuracy compared to conventional SGD training.

Averaging Weights Leads to Wider Optima and Better Generalization

25 Feb 2019 | Pavel Izmailov*1 Dmitrii Podoprikhin*2,3 Timur Garipov*4,5 Dmitry Vetrov2,3 Andrew Gordon Wilson1

25 Feb 2019 | Pavel Izmailov1 Dmitrii Podoprikhin2,3 Timur Garipov*4,5 Dmitry Vetrov2,3 Andrew Gordon Wilson1