8 Jul 2019 | Philip Bachman, R Devon Hjelm, William Buchwalter
This paper proposes a self-supervised representation learning method based on maximizing mutual information between features extracted from multiple views of a shared context. The approach, called Augmented Multiscale Deep InfoMax (AMDIM), extends the local version of Deep InfoMax (DIM) by maximizing mutual information between features from independently augmented views of an input, between multiple feature scales, and using a more powerful encoder. The model also introduces mixture-based representations, which naturally lead to segmentation-like behavior.
AMDIM outperforms prior methods on several benchmark datasets, including CIFAR10, CIFAR100, STL10, ImageNet, and Places205. On ImageNet, the model achieves 68.1% accuracy using standard linear evaluation, surpassing previous results by over 12%. On STL10, the model reaches over 94% accuracy with linear evaluation, significantly improving upon prior self-supervised results. The model also performs well on Places205, achieving 55% accuracy with ImageNet-pretrained features, which is 7% better than the best prior result.
The model uses data augmentation to generate multiple views of the input, and maximizes mutual information between features from these views. It also employs noise-contrastive estimation (NCE) to approximate mutual information bounds, with regularization techniques to improve stability. The model's encoder is based on a modified ResNet architecture, with adjustments to control receptive fields and ensure feature distribution stability.
Experiments show that AMDIM achieves strong performance across various tasks, including image classification and transfer learning. The model's ability to learn meaningful representations from unlabeled data makes it a promising approach for self-supervised learning. The paper also discusses the benefits of mixture-based representations, which naturally lead to segmentation-like behavior and are more sensitive to hyperparameters. Overall, AMDIM represents a significant advancement in self-supervised representation learning, offering improved performance and computational efficiency.This paper proposes a self-supervised representation learning method based on maximizing mutual information between features extracted from multiple views of a shared context. The approach, called Augmented Multiscale Deep InfoMax (AMDIM), extends the local version of Deep InfoMax (DIM) by maximizing mutual information between features from independently augmented views of an input, between multiple feature scales, and using a more powerful encoder. The model also introduces mixture-based representations, which naturally lead to segmentation-like behavior.
AMDIM outperforms prior methods on several benchmark datasets, including CIFAR10, CIFAR100, STL10, ImageNet, and Places205. On ImageNet, the model achieves 68.1% accuracy using standard linear evaluation, surpassing previous results by over 12%. On STL10, the model reaches over 94% accuracy with linear evaluation, significantly improving upon prior self-supervised results. The model also performs well on Places205, achieving 55% accuracy with ImageNet-pretrained features, which is 7% better than the best prior result.
The model uses data augmentation to generate multiple views of the input, and maximizes mutual information between features from these views. It also employs noise-contrastive estimation (NCE) to approximate mutual information bounds, with regularization techniques to improve stability. The model's encoder is based on a modified ResNet architecture, with adjustments to control receptive fields and ensure feature distribution stability.
Experiments show that AMDIM achieves strong performance across various tasks, including image classification and transfer learning. The model's ability to learn meaningful representations from unlabeled data makes it a promising approach for self-supervised learning. The paper also discusses the benefits of mixture-based representations, which naturally lead to segmentation-like behavior and are more sensitive to hyperparameters. Overall, AMDIM represents a significant advancement in self-supervised representation learning, offering improved performance and computational efficiency.