[slides and audio] Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles

This paper introduces a novel approach to unsupervised learning of visual representations using Jigsaw puzzles as a pretext task. The authors propose a Context-Free Network (CFN), a siamese-ennead convolutional neural network (CNN), which is trained to solve Jigsaw puzzles without any manual labeling. The CFN is designed to limit the receptive field to one tile at a time, ensuring that the network learns features that capture both object parts and their spatial arrangement. The paper demonstrates that the CFN can be trained in 2.5 days, significantly faster than previous methods, and achieves high performance in object classification and detection tasks. The learned features are highly transferable and outperform state-of-the-art methods in several benchmarks, showing that the CFN effectively captures semantically relevant content. The authors also discuss the impact of different permutation sets and techniques to prevent shortcuts, and provide visualizations of the network's activations to illustrate its effectiveness.This paper introduces a novel approach to unsupervised learning of visual representations using Jigsaw puzzles as a pretext task. The authors propose a Context-Free Network (CFN), a siamese-ennead convolutional neural network (CNN), which is trained to solve Jigsaw puzzles without any manual labeling. The CFN is designed to limit the receptive field to one tile at a time, ensuring that the network learns features that capture both object parts and their spatial arrangement. The paper demonstrates that the CFN can be trained in 2.5 days, significantly faster than previous methods, and achieves high performance in object classification and detection tasks. The learned features are highly transferable and outperform state-of-the-art methods in several benchmarks, showing that the CFN effectively captures semantically relevant content. The authors also discuss the impact of different permutation sets and techniques to prevent shortcuts, and provide visualizations of the network's activations to illustrate its effectiveness.

Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles

22 Aug 2017 | Mehdi Noroozi and Paolo Favaro