7 Feb 2021 | Ashish Jaiswal, Ashwin Ramesh Babu, Mohammad Zaki Zadeh, Debapriya Banerjee, Fillia Makedon
This paper provides an extensive review of self-supervised learning methods that utilize contrastive learning, a dominant approach in computer vision and natural language processing (NLP). Contrastive learning aims to embed augmented versions of the same sample close to each other while pushing away embeddings from different samples. The paper covers commonly used pretext tasks, such as color and geometric transformations, context-based tasks like jigsaw puzzles, and cross-modal tasks. It also discusses different architectures for contrastive learning, including end-to-end learning, memory banks, momentum encoders, and clustering feature representations. The performance of these methods on downstream tasks like image classification, object detection, and action recognition is evaluated, showing comparable or superior results to state-of-the-art supervised models. The paper concludes by discussing limitations and future directions, emphasizing the need for more theoretical analysis and addressing issues like dataset biases and negative sampling.This paper provides an extensive review of self-supervised learning methods that utilize contrastive learning, a dominant approach in computer vision and natural language processing (NLP). Contrastive learning aims to embed augmented versions of the same sample close to each other while pushing away embeddings from different samples. The paper covers commonly used pretext tasks, such as color and geometric transformations, context-based tasks like jigsaw puzzles, and cross-modal tasks. It also discusses different architectures for contrastive learning, including end-to-end learning, memory banks, momentum encoders, and clustering feature representations. The performance of these methods on downstream tasks like image classification, object detection, and action recognition is evaluated, showing comparable or superior results to state-of-the-art supervised models. The paper concludes by discussing limitations and future directions, emphasizing the need for more theoretical analysis and addressing issues like dataset biases and negative sampling.