Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning

Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning

10 Sep 2020 | Jean-Bastien Grill^*1 Florian Strub^*1 Florent Altché^*1 Corentin Tallec^*1 Pierre H. Richemond^*1,2 Elena Buchatskaya1 Carl Doersch1 Bernardo Avila Pires1 Zhaohan Daniel Guo1 Mohammad Gheshlaghi Azar1 Bilal Piot1 Koray Kavukcuoglu1 Rémi Munos1 Michal Valko1
Bootstrap Your Own Latent (BYOL) is a novel self-supervised learning method for image representation learning. BYOL uses two neural networks: an online network and a target network. The online network predicts the target network's representation of an image under a different augmentation, while the target network is updated using a slow-moving average of the online network's parameters. This approach avoids the need for negative pairs, which are commonly used in contrastive learning methods. BYOL achieves state-of-the-art results on ImageNet, with 74.3% top-1 accuracy using a ResNet-50 and 79.6% with a larger ResNet. It performs on par or better than current state-of-the-art methods on transfer and semi-supervised benchmarks. BYOL is more robust to changes in image augmentations and batch size compared to contrastive methods. The method is based on the idea of bootstrapping, where the target network's representation is used to improve the online network's representation. BYOL's performance is evaluated on various vision tasks, including classification, segmentation, object detection, and depth estimation. The method is implemented using a ResNet architecture and is available on GitHub. The paper also discusses the theoretical underpinnings of BYOL's behavior, including its ability to avoid collapsed representations through the use of a slow-moving average target network and a predictor. The results show that BYOL's representation is effective across multiple tasks and modalities, and that the method's performance is robust to changes in training settings.Bootstrap Your Own Latent (BYOL) is a novel self-supervised learning method for image representation learning. BYOL uses two neural networks: an online network and a target network. The online network predicts the target network's representation of an image under a different augmentation, while the target network is updated using a slow-moving average of the online network's parameters. This approach avoids the need for negative pairs, which are commonly used in contrastive learning methods. BYOL achieves state-of-the-art results on ImageNet, with 74.3% top-1 accuracy using a ResNet-50 and 79.6% with a larger ResNet. It performs on par or better than current state-of-the-art methods on transfer and semi-supervised benchmarks. BYOL is more robust to changes in image augmentations and batch size compared to contrastive methods. The method is based on the idea of bootstrapping, where the target network's representation is used to improve the online network's representation. BYOL's performance is evaluated on various vision tasks, including classification, segmentation, object detection, and depth estimation. The method is implemented using a ResNet architecture and is available on GitHub. The paper also discusses the theoretical underpinnings of BYOL's behavior, including its ability to avoid collapsed representations through the use of a slow-moving average target network and a predictor. The results show that BYOL's representation is effective across multiple tasks and modalities, and that the method's performance is robust to changes in training settings.
Reach us at info@study.space