**Abstract:**
Self-supervised learning (SSL) is rapidly closing the gap with supervised methods on large computer vision benchmarks. A common approach to SSL is to learn embeddings that are invariant to distortions of the input sample, but this often leads to trivial constant solutions. This paper proposes BARLOW TWINS, an objective function that naturally avoids these solutions by measuring the cross-correlation matrix between the outputs of two identical networks fed with distorted versions of a sample, aiming to make this matrix as close to the identity matrix as possible. This method, named after neuroscientist H. Barlow's *redundancy-reduction principle*, does not require large batches or asymmetry between the network twins and benefits from very high-dimensional output vectors. BARLOW TWINS outperforms previous methods on ImageNet for semi-supervised classification in the low-data regime and is competitive with state-of-the-art methods for ImageNet classification with a linear classifier head and transfer tasks of classification and object detection.
**Introduction:**
Self-supervised learning aims to learn useful representations of input data without human annotations. Recent advances show that self-supervised representations can be competitive with supervised representations. A common theme is learning invariance under different distortions using Siamese networks. However, trivial solutions like constant representations exist, and methods rely on mechanisms like contrastive losses, asymmetric learning updates, or non-differentiable operators to avoid them. BARLOW TWINS applies redundancy-reduction, a principle from neuroscience, to self-supervised learning, aiming to decorrelate embedding vectors while preserving invariance to distortions. It is conceptually simple, easy to implement, and learns useful representations without requiring large batches or asymmetric mechanisms. BARLOW TWINS outperforms previous methods on ImageNet in semi-supervised classification and is competitive with state-of-the-art methods for ImageNet classification and transfer tasks.
**Method:**
BARLOW TWINS operates on a joint embedding of distorted images. It produces two distorted views for each image in a batch, which are then fed to a deep network to produce batches of embeddings. The loss function $\mathcal{L}_{BT}$ consists of an invariance term and a redundancy reduction term, aiming to make the cross-correlation matrix between the outputs of the two networks close to the identity matrix. This decorrelates the embedding vectors while preserving invariance to distortions.
**Results:**
BARLOW TWINS is evaluated on ImageNet for linear and semi-supervised classification, as well as on other datasets and tasks such as image classification, object detection, and instance segmentation. It achieves competitive or superior performance compared to state-of-the-art methods.
**Ablations:**
Ablation studies show that both terms in the loss function are necessary for good performance. The method is robust to small batch sizes and does not require specific data augmentations. Increasing the dimensionality of the projector network output improves performance, contrary to other SSL methods. Breaking symmetry does not significantly**Abstract:**
Self-supervised learning (SSL) is rapidly closing the gap with supervised methods on large computer vision benchmarks. A common approach to SSL is to learn embeddings that are invariant to distortions of the input sample, but this often leads to trivial constant solutions. This paper proposes BARLOW TWINS, an objective function that naturally avoids these solutions by measuring the cross-correlation matrix between the outputs of two identical networks fed with distorted versions of a sample, aiming to make this matrix as close to the identity matrix as possible. This method, named after neuroscientist H. Barlow's *redundancy-reduction principle*, does not require large batches or asymmetry between the network twins and benefits from very high-dimensional output vectors. BARLOW TWINS outperforms previous methods on ImageNet for semi-supervised classification in the low-data regime and is competitive with state-of-the-art methods for ImageNet classification with a linear classifier head and transfer tasks of classification and object detection.
**Introduction:**
Self-supervised learning aims to learn useful representations of input data without human annotations. Recent advances show that self-supervised representations can be competitive with supervised representations. A common theme is learning invariance under different distortions using Siamese networks. However, trivial solutions like constant representations exist, and methods rely on mechanisms like contrastive losses, asymmetric learning updates, or non-differentiable operators to avoid them. BARLOW TWINS applies redundancy-reduction, a principle from neuroscience, to self-supervised learning, aiming to decorrelate embedding vectors while preserving invariance to distortions. It is conceptually simple, easy to implement, and learns useful representations without requiring large batches or asymmetric mechanisms. BARLOW TWINS outperforms previous methods on ImageNet in semi-supervised classification and is competitive with state-of-the-art methods for ImageNet classification and transfer tasks.
**Method:**
BARLOW TWINS operates on a joint embedding of distorted images. It produces two distorted views for each image in a batch, which are then fed to a deep network to produce batches of embeddings. The loss function $\mathcal{L}_{BT}$ consists of an invariance term and a redundancy reduction term, aiming to make the cross-correlation matrix between the outputs of the two networks close to the identity matrix. This decorrelates the embedding vectors while preserving invariance to distortions.
**Results:**
BARLOW TWINS is evaluated on ImageNet for linear and semi-supervised classification, as well as on other datasets and tasks such as image classification, object detection, and instance segmentation. It achieves competitive or superior performance compared to state-of-the-art methods.
**Ablations:**
Ablation studies show that both terms in the loss function are necessary for good performance. The method is robust to small batch sizes and does not require specific data augmentations. Increasing the dimensionality of the projector network output improves performance, contrary to other SSL methods. Breaking symmetry does not significantly