1 Apr 2017 | Gao Huang*, Yixuan Li, Geoff Pleiss, Zhuang Liu, John E. Hopcroft, Kilian Q. Weinberger
The paper introduces Snapshot Ensembling, a method to create ensembles of neural networks without incurring additional training costs. By leveraging the ability of Stochastic Gradient Descent (SGD) to converge to and escape from local minima, the method trains a single neural network multiple times, each time converging to a different local minimum. Model snapshots are taken at these local minima, and the ensemble is formed by averaging the predictions of the last \( m \) models, where \( m \leq M \). The technique uses a cyclic learning rate schedule to facilitate rapid convergence and diversity in the ensemble. Experiments on various datasets and architectures show that Snapshot Ensembling consistently achieves lower error rates compared to single models and traditional ensembles, demonstrating its effectiveness and efficiency.The paper introduces Snapshot Ensembling, a method to create ensembles of neural networks without incurring additional training costs. By leveraging the ability of Stochastic Gradient Descent (SGD) to converge to and escape from local minima, the method trains a single neural network multiple times, each time converging to a different local minimum. Model snapshots are taken at these local minima, and the ensemble is formed by averaging the predictions of the last \( m \) models, where \( m \leq M \). The technique uses a cyclic learning rate schedule to facilitate rapid convergence and diversity in the ensemble. Experiments on various datasets and architectures show that Snapshot Ensembling consistently achieves lower error rates compared to single models and traditional ensembles, demonstrating its effectiveness and efficiency.