9 Mar 2020 | Xinlei Chen Haoqi Fan Ross Girshick Kaiming He
This paper presents improved baselines for unsupervised learning using the Momentum Contrast (MoCo) framework. The authors verify the effectiveness of two design improvements from SimCLR—using an MLP projection head and stronger data augmentation—by implementing them in MoCo. These modifications lead to stronger baselines that outperform SimCLR and do not require large training batches. The MoCo v2 baselines can run on a typical 8-GPU machine and achieve better results than SimCLR, which requires TPU support and large batches. The authors also show that the MLP head and data augmentation are orthogonal to the MoCo framework and improve image classification and object detection transfer learning results.
The paper evaluates the performance of MoCo v2 on ImageNet linear classification and VOC object detection. The MLP head improves ImageNet linear classification accuracy from 60.6% to 66.2%, while stronger data augmentation improves the MoCo baseline on ImageNet by 2.8% to 63.4%. The detection accuracy is higher than that of using the MLP alone, despite lower linear classification accuracy. With the MLP and extra augmentation, the ImageNet accuracy increases to 67.3%.
The paper compares MoCo v2 with SimCLR and shows that MoCo v2 achieves higher accuracy on ImageNet with the same number of epochs and batch size. With 200 epochs and a batch size of 256, MoCo v2 achieves 67.5% accuracy, which is 5.6% higher than SimCLR. With 800 epochs, MoCo v2 achieves 71.1%, outperforming SimCLR's 69.3% with 1000 epochs.
The paper also reports the computational cost of MoCo v2, showing that large batches are not necessary for good accuracy. The improvements require only a few lines of code changes to MoCo v1 and will be made public to facilitate future research.This paper presents improved baselines for unsupervised learning using the Momentum Contrast (MoCo) framework. The authors verify the effectiveness of two design improvements from SimCLR—using an MLP projection head and stronger data augmentation—by implementing them in MoCo. These modifications lead to stronger baselines that outperform SimCLR and do not require large training batches. The MoCo v2 baselines can run on a typical 8-GPU machine and achieve better results than SimCLR, which requires TPU support and large batches. The authors also show that the MLP head and data augmentation are orthogonal to the MoCo framework and improve image classification and object detection transfer learning results.
The paper evaluates the performance of MoCo v2 on ImageNet linear classification and VOC object detection. The MLP head improves ImageNet linear classification accuracy from 60.6% to 66.2%, while stronger data augmentation improves the MoCo baseline on ImageNet by 2.8% to 63.4%. The detection accuracy is higher than that of using the MLP alone, despite lower linear classification accuracy. With the MLP and extra augmentation, the ImageNet accuracy increases to 67.3%.
The paper compares MoCo v2 with SimCLR and shows that MoCo v2 achieves higher accuracy on ImageNet with the same number of epochs and batch size. With 200 epochs and a batch size of 256, MoCo v2 achieves 67.5% accuracy, which is 5.6% higher than SimCLR. With 800 epochs, MoCo v2 achieves 71.1%, outperforming SimCLR's 69.3% with 1000 epochs.
The paper also reports the computational cost of MoCo v2, showing that large batches are not necessary for good accuracy. The improvements require only a few lines of code changes to MoCo v1 and will be made public to facilitate future research.