Momentum Contrast for Unsupervised Visual Representation Learning

Momentum Contrast for Unsupervised Visual Representation Learning

23 Mar 2020 | Kaiming He Haoqi Fan Yuxin Wu Saining Xie Ross Girshick
Momentum Contrast (MoCo) is a method for unsupervised visual representation learning that uses a contrastive loss to build a dynamic dictionary of encoded keys. The dictionary is maintained as a queue of data samples, allowing for a large and consistent set of keys that evolve during training. MoCo employs a momentum update to keep the key encoder consistent, enabling effective contrastive learning. The method is trained using a simple instance discrimination task, where a query matches a key if they are encoded views of the same image. MoCo achieves competitive results on ImageNet classification and outperforms supervised pre-training in several downstream tasks, including detection and segmentation on Pascal VOC, COCO, and other datasets. This suggests that the gap between unsupervised and supervised representation learning has been largely closed in many vision tasks. MoCo is effective on large-scale, relatively uncurated datasets like Instagram, demonstrating its utility in real-world scenarios. The method is compared with other contrastive learning approaches, showing that MoCo's dynamic dictionary and momentum update contribute to its effectiveness. MoCo is also shown to transfer well to various downstream tasks, including object detection, segmentation, and semantic segmentation, with competitive performance against supervised pre-training. The results indicate that MoCo can serve as a strong alternative to ImageNet supervised pre-training in many applications.Momentum Contrast (MoCo) is a method for unsupervised visual representation learning that uses a contrastive loss to build a dynamic dictionary of encoded keys. The dictionary is maintained as a queue of data samples, allowing for a large and consistent set of keys that evolve during training. MoCo employs a momentum update to keep the key encoder consistent, enabling effective contrastive learning. The method is trained using a simple instance discrimination task, where a query matches a key if they are encoded views of the same image. MoCo achieves competitive results on ImageNet classification and outperforms supervised pre-training in several downstream tasks, including detection and segmentation on Pascal VOC, COCO, and other datasets. This suggests that the gap between unsupervised and supervised representation learning has been largely closed in many vision tasks. MoCo is effective on large-scale, relatively uncurated datasets like Instagram, demonstrating its utility in real-world scenarios. The method is compared with other contrastive learning approaches, showing that MoCo's dynamic dictionary and momentum update contribute to its effectiveness. MoCo is also shown to transfer well to various downstream tasks, including object detection, segmentation, and semantic segmentation, with competitive performance against supervised pre-training. The results indicate that MoCo can serve as a strong alternative to ImageNet supervised pre-training in many applications.
Reach us at info@study.space
Understanding Momentum Contrast for Unsupervised Visual Representation Learning