2024 | Chaoqun Du, Yulin Wang, Shiji Song, and Gao Huang
Probabilistic Contrastive Learning for Long-Tailed Visual Recognition proposes a novel algorithm, ProCo, to address the challenges of long-tailed data distributions in visual recognition. Long-tailed distributions are common in real-world data, where many minority categories have few samples, leading to performance issues for standard supervised learning. ProCo estimates the data distribution of samples in the feature space and samples contrastive pairs accordingly. It uses a mixture of von Mises-Fisher (vMF) distributions on the unit sphere to model feature distributions, allowing for efficient parameter estimation and deriving a closed-form expected contrastive loss. This approach eliminates the need for large batches and improves performance on both supervised and semi-supervised tasks. ProCo is also applicable to semi-supervised learning by generating pseudo-labels for unlabeled data. Theoretical analysis shows that ProCo has a bounded error, and empirical results on various datasets demonstrate its effectiveness. The method is implemented in Python and available at the provided GitHub link.Probabilistic Contrastive Learning for Long-Tailed Visual Recognition proposes a novel algorithm, ProCo, to address the challenges of long-tailed data distributions in visual recognition. Long-tailed distributions are common in real-world data, where many minority categories have few samples, leading to performance issues for standard supervised learning. ProCo estimates the data distribution of samples in the feature space and samples contrastive pairs accordingly. It uses a mixture of von Mises-Fisher (vMF) distributions on the unit sphere to model feature distributions, allowing for efficient parameter estimation and deriving a closed-form expected contrastive loss. This approach eliminates the need for large batches and improves performance on both supervised and semi-supervised tasks. ProCo is also applicable to semi-supervised learning by generating pseudo-labels for unlabeled data. Theoretical analysis shows that ProCo has a bounded error, and empirical results on various datasets demonstrate its effectiveness. The method is implemented in Python and available at the provided GitHub link.