[slides] Learning Representations by Maximizing Mutual Information Across Views

The paper introduces a self-supervised representation learning approach that maximizes mutual information between features extracted from multiple views of a shared context. The authors propose Augmented Multiscale Deep InfoMax (AMDIM), which extends the local Deep InfoMax (DIM) model by predicting features across independently-augmented views, predicting at multiple scales, and using a more powerful encoder. AMDIM is evaluated on standard datasets such as CIFAR10, CIFAR100, STL10, ImageNet, and Places205, achieving significant improvements over prior methods. Notably, AMDIM achieves 68.1% accuracy on ImageNet using standard linear evaluation, outperforming the best prior result by over 12% and concurrent results by 7%. The model also exhibits segmentation behavior when using mixture-based representations. The authors discuss the method's extensions, experiments, and future research directions.The paper introduces a self-supervised representation learning approach that maximizes mutual information between features extracted from multiple views of a shared context. The authors propose Augmented Multiscale Deep InfoMax (AMDIM), which extends the local Deep InfoMax (DIM) model by predicting features across independently-augmented views, predicting at multiple scales, and using a more powerful encoder. AMDIM is evaluated on standard datasets such as CIFAR10, CIFAR100, STL10, ImageNet, and Places205, achieving significant improvements over prior methods. Notably, AMDIM achieves 68.1% accuracy on ImageNet using standard linear evaluation, outperforming the best prior result by over 12% and concurrent results by 7%. The model also exhibits segmentation behavior when using mixture-based representations. The authors discuss the method's extensions, experiments, and future research directions.

Learning Representations by Maximizing Mutual Information Across Views

8 Jul 2019 | Philip Bachman, R Devon Hjelm, William Buchwalter