SimSiam is a simple Siamese network that learns meaningful visual representations without negative samples, large batches, or momentum encoders. The method uses two augmented views of an image, processed by a shared encoder, and a prediction MLP that matches the outputs of the two views. A stop-gradient operation is applied to one side of the output to prevent collapsing solutions. The method achieves competitive results on ImageNet and downstream tasks. The stop-gradient operation is critical for preventing collapsing, and the method is hypothesized to implicitly optimize an Expectation-Maximization-like problem. SimSiam outperforms other methods in transfer learning tasks, including object detection and instance segmentation. The method is simple, effective, and can serve as a baseline for further research on Siamese architectures in unsupervised representation learning.SimSiam is a simple Siamese network that learns meaningful visual representations without negative samples, large batches, or momentum encoders. The method uses two augmented views of an image, processed by a shared encoder, and a prediction MLP that matches the outputs of the two views. A stop-gradient operation is applied to one side of the output to prevent collapsing solutions. The method achieves competitive results on ImageNet and downstream tasks. The stop-gradient operation is critical for preventing collapsing, and the method is hypothesized to implicitly optimize an Expectation-Maximization-like problem. SimSiam outperforms other methods in transfer learning tasks, including object detection and instance segmentation. The method is simple, effective, and can serve as a baseline for further research on Siamese architectures in unsupervised representation learning.