1 Apr 2017 | Gao Huang*, Yixuan Li, Geoff Pleiss, Zhuang Liu, John E. Hopcroft, Kilian Q. Weinberger
Snapshot Ensembling is a method to train multiple neural networks without additional training cost. The approach leverages the ability of Stochastic Gradient Descent (SGD) to converge to and escape from local minima. By training a single neural network and saving model parameters at multiple local minima along its optimization path, the method creates an ensemble of models. This technique, called Snapshot Ensembling, uses cyclic learning rate schedules to achieve rapid convergence and diversity in the models. The resulting ensemble is simple and effective, achieving lower error rates than state-of-the-art single models and performing well compared to traditional ensembles. Experiments show that Snapshot Ensembling reduces error rates on CIFAR-10 and CIFAR-100, with DenseNet Snapshot Ensembles achieving error rates of 3.4% and 17.4% respectively. The method is compatible with various network architectures and learning tasks, and can be combined with other techniques to improve accuracy. The training time for the ensemble is the same as for a single model, and the method is efficient in terms of training cost. The approach is effective in reducing test error and is complementary to other methods that aim to reduce training cost. The method is inspired by recent findings on cyclic learning rates and leverages the diversity of local minima visited during training to create an effective ensemble. The results show that Snapshot Ensembling outperforms traditional ensembles and single models, and is particularly effective on complex datasets like CIFAR-100 and Tiny ImageNet. The method is robust to variations in the number of cycles and training budget, and is effective in both low and high budget scenarios. The diversity of the ensemble is also demonstrated through parameter and activation space analysis, showing that the models have different predictions and are well-suited for test-time ensembling. The method is simple, effective, and has the potential to be combined with traditional ensembles for further improvements.Snapshot Ensembling is a method to train multiple neural networks without additional training cost. The approach leverages the ability of Stochastic Gradient Descent (SGD) to converge to and escape from local minima. By training a single neural network and saving model parameters at multiple local minima along its optimization path, the method creates an ensemble of models. This technique, called Snapshot Ensembling, uses cyclic learning rate schedules to achieve rapid convergence and diversity in the models. The resulting ensemble is simple and effective, achieving lower error rates than state-of-the-art single models and performing well compared to traditional ensembles. Experiments show that Snapshot Ensembling reduces error rates on CIFAR-10 and CIFAR-100, with DenseNet Snapshot Ensembles achieving error rates of 3.4% and 17.4% respectively. The method is compatible with various network architectures and learning tasks, and can be combined with other techniques to improve accuracy. The training time for the ensemble is the same as for a single model, and the method is efficient in terms of training cost. The approach is effective in reducing test error and is complementary to other methods that aim to reduce training cost. The method is inspired by recent findings on cyclic learning rates and leverages the diversity of local minima visited during training to create an effective ensemble. The results show that Snapshot Ensembling outperforms traditional ensembles and single models, and is particularly effective on complex datasets like CIFAR-100 and Tiny ImageNet. The method is robust to variations in the number of cycles and training budget, and is effective in both low and high budget scenarios. The diversity of the ensemble is also demonstrated through parameter and activation space analysis, showing that the models have different predictions and are well-suited for test-time ensembling. The method is simple, effective, and has the potential to be combined with traditional ensembles for further improvements.