[slides and audio] Big Self-Supervised Models are Strong Semi-Supervised Learners

The paper "Big Self-Supervised Models are Strong Semi-Supervised Learners" by Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, and Geoffrey Hinton from Google Research, Brain Team, explores the effectiveness of large, deep, and wide models in semi-supervised learning. The authors demonstrate that using big models during unsupervised pretraining and supervised fine-tuning can significantly improve performance with few labeled examples. Key findings include: 1. **Task-Agnostic Use of Unlabeled Data**: The approach leverages unlabeled data in a task-agnostic manner during pretraining, followed by supervised fine-tuning on a few labeled examples. This method outperforms previous state-of-the-art methods, achieving 73.9% top-1 accuracy on ImageNet with only 1% of labels (≤13 labeled images per class) using ResNet-50. 2. **Task-Specific Use of Unlabeled Data**: After fine-tuning, the big network can be distilled into a smaller network using unlabeled examples in a task-specific way, further improving classification accuracy with little loss. 3. **Model Size and Label Efficiency**: Bigger models are more label-efficient, performing better with fewer labeled examples. For example, a 10× improvement in label efficiency is observed when using ResNet-152 (3×+SK) compared to ResNet-50. 4. **Projection Head Importance**: A deeper projection head during pretraining improves representation quality and semi-supervised performance when fine-tuning from a middle layer of the projection head. 5. **Distillation with Unlabeled Data**: Distilling the knowledge from a large model to a smaller one using unlabeled data enhances the performance of the smaller model, both in terms of label efficiency and task-specific performance. The proposed semi-supervised learning framework consists of three steps: unsupervised pretraining, supervised fine-tuning, and distillation using unlabeled data. The authors also provide empirical evidence that bigger models are more label-efficient and that task-agnostic learned representations can be effectively distilled into specialized and compact networks using unlabeled examples.The paper "Big Self-Supervised Models are Strong Semi-Supervised Learners" by Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, and Geoffrey Hinton from Google Research, Brain Team, explores the effectiveness of large, deep, and wide models in semi-supervised learning. The authors demonstrate that using big models during unsupervised pretraining and supervised fine-tuning can significantly improve performance with few labeled examples. Key findings include: 1. **Task-Agnostic Use of Unlabeled Data**: The approach leverages unlabeled data in a task-agnostic manner during pretraining, followed by supervised fine-tuning on a few labeled examples. This method outperforms previous state-of-the-art methods, achieving 73.9% top-1 accuracy on ImageNet with only 1% of labels (≤13 labeled images per class) using ResNet-50. 2. **Task-Specific Use of Unlabeled Data**: After fine-tuning, the big network can be distilled into a smaller network using unlabeled examples in a task-specific way, further improving classification accuracy with little loss. 3. **Model Size and Label Efficiency**: Bigger models are more label-efficient, performing better with fewer labeled examples. For example, a 10× improvement in label efficiency is observed when using ResNet-152 (3×+SK) compared to ResNet-50. 4. **Projection Head Importance**: A deeper projection head during pretraining improves representation quality and semi-supervised performance when fine-tuning from a middle layer of the projection head. 5. **Distillation with Unlabeled Data**: Distilling the knowledge from a large model to a smaller one using unlabeled data enhances the performance of the smaller model, both in terms of label efficiency and task-specific performance. The proposed semi-supervised learning framework consists of three steps: unsupervised pretraining, supervised fine-tuning, and distillation using unlabeled data. The authors also provide empirical evidence that bigger models are more label-efficient and that task-agnostic learned representations can be effectively distilled into specialized and compact networks using unlabeled examples.

Big Self-Supervised Models are Strong Semi-Supervised Learners

26 Oct 2020 | Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, Geoffrey Hinton