[slides and audio] Supervised Contrastive Learning

This paper introduces a novel supervised contrastive loss function, SupCon, which extends the self-supervised contrastive learning approach to fully supervised settings. By leveraging label information, SupCon pulls together clusters of points from the same class in embedding space while pushing apart clusters from different classes. The authors analyze two formulations of the supervised contrastive loss and identify the best-performing one. On the ImageNet dataset, SupCon achieves a top-1 accuracy of 81.4% on ResNet-200, surpassing the best reported accuracy for this architecture. The loss function is shown to be robust to natural corruptions and more stable to hyperparameter settings compared to cross-entropy. SupCon is simple to implement and stable to train, demonstrating superior performance on various datasets and ResNet variants. The loss function is also shown to encourage learning from hard positives and negatives, and is less sensitive to hyperparameters. The authors provide a detailed analysis of the gradients of the loss function and show that it naturally performs hard positive/negative mining, eliminating the need for explicit mining algorithms.This paper introduces a novel supervised contrastive loss function, SupCon, which extends the self-supervised contrastive learning approach to fully supervised settings. By leveraging label information, SupCon pulls together clusters of points from the same class in embedding space while pushing apart clusters from different classes. The authors analyze two formulations of the supervised contrastive loss and identify the best-performing one. On the ImageNet dataset, SupCon achieves a top-1 accuracy of 81.4% on ResNet-200, surpassing the best reported accuracy for this architecture. The loss function is shown to be robust to natural corruptions and more stable to hyperparameter settings compared to cross-entropy. SupCon is simple to implement and stable to train, demonstrating superior performance on various datasets and ResNet variants. The loss function is also shown to encourage learning from hard positives and negatives, and is less sensitive to hyperparameters. The authors provide a detailed analysis of the gradients of the loss function and show that it naturally performs hard positive/negative mining, eliminating the need for explicit mining algorithms.

Supervised Contrastive Learning

10 Mar 2021 | Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, Dilip Krishnan