[slides] Unsupervised Learning of Visual Features by Contrasting Cluster Assignments

This paper introduces SwAV (Swapping Assignments between Views), an online algorithm for unsupervised image representation learning. Unlike traditional contrastive learning methods that require explicit pairwise comparisons, SwAV clusters data while enforcing consistency between cluster assignments from different augmentations of the same image. This approach avoids the computational burden of pairwise comparisons and scales well to large datasets. SwAV can be trained with both large and small batches and does not require a large memory bank or a momentum encoder. The paper also proposes a multi-crop strategy, which uses a mix of different resolutions to increase the number of views without increasing memory or computational requirements. Experiments on ImageNet show that SwAV achieves 75.3% top-1 accuracy with ResNet-50, surpassing supervised pretraining on multiple downstream tasks. The multi-crop strategy consistently improves performance by 2-4% on various self-supervised methods. Overall, SwAV demonstrates significant improvements in efficiency and performance compared to existing contrastive learning methods.This paper introduces SwAV (Swapping Assignments between Views), an online algorithm for unsupervised image representation learning. Unlike traditional contrastive learning methods that require explicit pairwise comparisons, SwAV clusters data while enforcing consistency between cluster assignments from different augmentations of the same image. This approach avoids the computational burden of pairwise comparisons and scales well to large datasets. SwAV can be trained with both large and small batches and does not require a large memory bank or a momentum encoder. The paper also proposes a multi-crop strategy, which uses a mix of different resolutions to increase the number of views without increasing memory or computational requirements. Experiments on ImageNet show that SwAV achieves 75.3% top-1 accuracy with ResNet-50, surpassing supervised pretraining on multiple downstream tasks. The multi-crop strategy consistently improves performance by 2-4% on various self-supervised methods. Overall, SwAV demonstrates significant improvements in efficiency and performance compared to existing contrastive learning methods.

Unsupervised Learning of Visual Features by Contrasting Cluster Assignments

8 Jan 2021 | Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, Armand Joulin