Temporal Ensembling for Semi-Supervised Learning

Temporal Ensembling for Semi-Supervised Learning

15 Mar 2017 | Samuli Laine, Timo Aila
This paper presents a simple and efficient method for training deep neural networks in a semi-supervised setting where only a small portion of training data is labeled. The method, called self-ensembling, forms a consensus prediction of unknown labels using outputs of the network on different epochs and under different regularization and input augmentation conditions. This ensemble prediction is expected to be a better predictor for unknown labels than the output of the network at the most recent training epoch and can be used as a target for training. The method achieves new records on two standard semi-supervised learning benchmarks, reducing classification error rates significantly. For example, on SVHN with 500 labels, the error rate drops from 18.44% to 7.05%, and on CIFAR-10 with 4000 labels, it drops from 18.63% to 16.55%. Further improvements are achieved by enabling standard augmentations. The method also improves CIFAR-100 classification accuracy by using random images from the Tiny Images dataset as unlabeled inputs. The method is also tolerant to incorrect labels. Two implementations of self-ensembling are presented: the II-model and temporal ensembling. The II-model encourages consistent network output between two realizations of the same input under different dropout conditions. Temporal ensembling simplifies and extends this by considering network predictions over multiple previous training epochs. The II-model and temporal ensembling both surpass prior state-of-the-art results in semi-supervised learning. The II-model is similar to the Γ-model of the ladder network but is conceptually simpler. Temporal ensembling is faster and produces less noisy training targets. The method is tested on CIFAR-10 and SVHN, achieving significant improvements in classification accuracy. The method is also tested on CIFAR-100 with additional unlabeled data from Tiny Images, showing improved performance. The method is also tolerant to incorrect labels, as demonstrated by its performance when a portion of the labels are randomized. The method is compared to other semi-supervised learning approaches, including the Γ-model and transform/stability loss. The method is also compared to fully supervised learning, showing that it can achieve better performance even when all labels are used. The method is implemented using a specific neural network architecture and training parameters. The method is shown to be effective in a variety of semi-supervised learning tasks, including image classification.This paper presents a simple and efficient method for training deep neural networks in a semi-supervised setting where only a small portion of training data is labeled. The method, called self-ensembling, forms a consensus prediction of unknown labels using outputs of the network on different epochs and under different regularization and input augmentation conditions. This ensemble prediction is expected to be a better predictor for unknown labels than the output of the network at the most recent training epoch and can be used as a target for training. The method achieves new records on two standard semi-supervised learning benchmarks, reducing classification error rates significantly. For example, on SVHN with 500 labels, the error rate drops from 18.44% to 7.05%, and on CIFAR-10 with 4000 labels, it drops from 18.63% to 16.55%. Further improvements are achieved by enabling standard augmentations. The method also improves CIFAR-100 classification accuracy by using random images from the Tiny Images dataset as unlabeled inputs. The method is also tolerant to incorrect labels. Two implementations of self-ensembling are presented: the II-model and temporal ensembling. The II-model encourages consistent network output between two realizations of the same input under different dropout conditions. Temporal ensembling simplifies and extends this by considering network predictions over multiple previous training epochs. The II-model and temporal ensembling both surpass prior state-of-the-art results in semi-supervised learning. The II-model is similar to the Γ-model of the ladder network but is conceptually simpler. Temporal ensembling is faster and produces less noisy training targets. The method is tested on CIFAR-10 and SVHN, achieving significant improvements in classification accuracy. The method is also tested on CIFAR-100 with additional unlabeled data from Tiny Images, showing improved performance. The method is also tolerant to incorrect labels, as demonstrated by its performance when a portion of the labels are randomized. The method is compared to other semi-supervised learning approaches, including the Γ-model and transform/stability loss. The method is also compared to fully supervised learning, showing that it can achieve better performance even when all labels are used. The method is implemented using a specific neural network architecture and training parameters. The method is shown to be effective in a variety of semi-supervised learning tasks, including image classification.
Reach us at info@study.space
[slides and audio] Temporal Ensembling for Semi-Supervised Learning