18 Dec 2020 | Yonglong Tian, Chen Sun, Ben Poole, Dilip Krishnan, Cordelia Schmid, Phillip Isola
This paper explores the importance of view selection in contrastive learning, a self-supervised representation learning technique. The authors argue that the optimal views for contrastive learning should minimize mutual information (MI) while retaining task-relevant information. They propose an "InfoMin principle," suggesting that effective views should share only the necessary information for the downstream task. To validate this hypothesis, they develop unsupervised and semi-supervised frameworks to learn effective views by reducing MI. They also demonstrate that increasing data augmentation can further reduce MI and improve downstream classification accuracy. As a result, their method achieves a new state-of-the-art accuracy of 73% on the ImageNet linear readout benchmark with a ResNet-50. The paper includes theoretical and empirical analyses, showing that optimal views depend on the downstream task and that data augmentation can guide the construction of views to achieve the sweet spot in MI and accuracy.This paper explores the importance of view selection in contrastive learning, a self-supervised representation learning technique. The authors argue that the optimal views for contrastive learning should minimize mutual information (MI) while retaining task-relevant information. They propose an "InfoMin principle," suggesting that effective views should share only the necessary information for the downstream task. To validate this hypothesis, they develop unsupervised and semi-supervised frameworks to learn effective views by reducing MI. They also demonstrate that increasing data augmentation can further reduce MI and improve downstream classification accuracy. As a result, their method achieves a new state-of-the-art accuracy of 73% on the ImageNet linear readout benchmark with a ResNet-50. The paper includes theoretical and empirical analyses, showing that optimal views depend on the downstream task and that data augmentation can guide the construction of views to achieve the sweet spot in MI and accuracy.