Understanding Probabilistic Contrastive Learning for Long-Tailed Visual Recognition

The paper introduces a novel probabilistic contrastive learning (ProCo) algorithm to address the long-tailed distribution issue in visual recognition tasks. Long-tailed distributions, characterized by an exponential decline in the number of samples per class, pose significant challenges for standard supervised learning algorithms. Supervised contrastive learning (SCL) has shown promise in alleviating this imbalance, but it requires large batches of training data, which is often impractical for class-imbalanced datasets. To overcome this, ProCo estimates the data distribution of samples from each class in the feature space and samples contrastive pairs accordingly. The key innovation is the assumption that normalized features in contrastive learning follow a mixture of von Mises-Fisher (vMF) distributions on the unit space. This assumption allows for efficient parameter estimation using only the first sample moment and enables the derivation of a closed-form expected contrastive loss, eliminating the need for explicit sampling of numerous contrastive pairs. The method is also extended to semi-supervised learning by generating pseudo-labels for unlabeled data. Extensive experiments on various datasets demonstrate that ProCo consistently outperforms existing methods in both supervised and semi-supervised visual recognition and object detection tasks. The code for ProCo is available at <https://github.com/LeapLabTHU/ProCo>.The paper introduces a novel probabilistic contrastive learning (ProCo) algorithm to address the long-tailed distribution issue in visual recognition tasks. Long-tailed distributions, characterized by an exponential decline in the number of samples per class, pose significant challenges for standard supervised learning algorithms. Supervised contrastive learning (SCL) has shown promise in alleviating this imbalance, but it requires large batches of training data, which is often impractical for class-imbalanced datasets. To overcome this, ProCo estimates the data distribution of samples from each class in the feature space and samples contrastive pairs accordingly. The key innovation is the assumption that normalized features in contrastive learning follow a mixture of von Mises-Fisher (vMF) distributions on the unit space. This assumption allows for efficient parameter estimation using only the first sample moment and enables the derivation of a closed-form expected contrastive loss, eliminating the need for explicit sampling of numerous contrastive pairs. The method is also extended to semi-supervised learning by generating pseudo-labels for unlabeled data. Extensive experiments on various datasets demonstrate that ProCo consistently outperforms existing methods in both supervised and semi-supervised visual recognition and object detection tasks. The code for ProCo is available at <https://github.com/LeapLabTHU/ProCo>.

Probabilistic Contrastive Learning for Long-Tailed Visual Recognition

14 Mar 2024 | Chaoqun Du, Yulin Wang, Shiji Song, and Gao Huang