Mean Teacher improves semi-supervised deep learning results by averaging model weights instead of label predictions. This method outperforms Temporal Ensembling, achieving a 4.35% error rate on SVHN with 250 labels, compared to 1000 labels for Temporal Ensembling. Combining Mean Teacher with Residual Networks improves performance on CIFAR-10 from 10.55% to 6.28% with 4000 labels, and on ImageNet 2012 from 35.24% to 9.11% with 10% labels. Mean Teacher scales well to large datasets and online learning, and uses unlabeled data more efficiently than the II model. It also improves test accuracy and allows training with fewer labels. The method uses exponential moving average weights for the teacher model, and the consistency cost is defined as the expected distance between student and teacher predictions. Experiments show that Mean Teacher performs better than Virtual Adversarial Training on some benchmarks, and that the combination of the two methods may yield even better results. The method is compatible with various neural network architectures, including ConvNets and ResNets, and is effective in semi-supervised learning tasks. The success of consistency regularization depends on the quality of teacher-generated targets, and Mean Teacher and Virtual Adversarial Training represent two ways of exploiting this principle. The method is also effective in reducing overfitting and improving model generalization.Mean Teacher improves semi-supervised deep learning results by averaging model weights instead of label predictions. This method outperforms Temporal Ensembling, achieving a 4.35% error rate on SVHN with 250 labels, compared to 1000 labels for Temporal Ensembling. Combining Mean Teacher with Residual Networks improves performance on CIFAR-10 from 10.55% to 6.28% with 4000 labels, and on ImageNet 2012 from 35.24% to 9.11% with 10% labels. Mean Teacher scales well to large datasets and online learning, and uses unlabeled data more efficiently than the II model. It also improves test accuracy and allows training with fewer labels. The method uses exponential moving average weights for the teacher model, and the consistency cost is defined as the expected distance between student and teacher predictions. Experiments show that Mean Teacher performs better than Virtual Adversarial Training on some benchmarks, and that the combination of the two methods may yield even better results. The method is compatible with various neural network architectures, including ConvNets and ResNets, and is effective in semi-supervised learning tasks. The success of consistency regularization depends on the quality of teacher-generated targets, and Mean Teacher and Virtual Adversarial Training represent two ways of exploiting this principle. The method is also effective in reducing overfitting and improving model generalization.