Barlow Twins is a self-supervised learning method that uses redundancy reduction to learn invariant representations of data. The method is based on the principle of redundancy reduction, a concept from neuroscience, which aims to minimize redundancy between components of representations. The objective function of Barlow Twins measures the cross-correlation matrix between the outputs of two identical networks fed with distorted versions of a sample, aiming to make this matrix close to the identity matrix. This encourages the embedding vectors of distorted versions of a sample to be similar while minimizing redundancy between the components of these vectors. Barlow Twins does not require large batches or asymmetry between the network twins, such as predictor networks, gradient stopping, or moving averages on weight updates. It benefits from high-dimensional output vectors and outperforms previous methods on ImageNet for semi-supervised classification in the low-data regime and is competitive with current state-of-the-art methods for ImageNet classification and transfer tasks. The method is conceptually simple, avoids trivial constant solutions, and is robust to training batch size. Barlow Twins uses a loss function that combines an invariance term and a redundancy reduction term, and it is compared to other methods in terms of performance and robustness. The method is also discussed in the context of the information bottleneck principle, which is used to find a representation that conserves as much information about the sample as possible while being least informative about the specific distortions applied. Barlow Twins is shown to be effective in various tasks, including image classification, object detection, and instance segmentation, and is robust to small batch sizes and different data augmentations. The method is also compared to other self-supervised learning methods, such as SIMCLR and BYOL, and is found to be more effective in certain scenarios.Barlow Twins is a self-supervised learning method that uses redundancy reduction to learn invariant representations of data. The method is based on the principle of redundancy reduction, a concept from neuroscience, which aims to minimize redundancy between components of representations. The objective function of Barlow Twins measures the cross-correlation matrix between the outputs of two identical networks fed with distorted versions of a sample, aiming to make this matrix close to the identity matrix. This encourages the embedding vectors of distorted versions of a sample to be similar while minimizing redundancy between the components of these vectors. Barlow Twins does not require large batches or asymmetry between the network twins, such as predictor networks, gradient stopping, or moving averages on weight updates. It benefits from high-dimensional output vectors and outperforms previous methods on ImageNet for semi-supervised classification in the low-data regime and is competitive with current state-of-the-art methods for ImageNet classification and transfer tasks. The method is conceptually simple, avoids trivial constant solutions, and is robust to training batch size. Barlow Twins uses a loss function that combines an invariance term and a redundancy reduction term, and it is compared to other methods in terms of performance and robustness. The method is also discussed in the context of the information bottleneck principle, which is used to find a representation that conserves as much information about the sample as possible while being least informative about the specific distortions applied. Barlow Twins is shown to be effective in various tasks, including image classification, object detection, and instance segmentation, and is robust to small batch sizes and different data augmentations. The method is also compared to other self-supervised learning methods, such as SIMCLR and BYOL, and is found to be more effective in certain scenarios.