[slides and audio] Deep Clustering for Unsupervised Learning of Visual Features

DeepCluster is a clustering method for unsupervised learning of visual features. It jointly learns the parameters of a neural network and the cluster assignments of the resulting features. The method iteratively groups features using k-means and uses the cluster assignments as pseudo-labels to update the network weights. DeepCluster is applied to the unsupervised training of convolutional neural networks on large datasets like ImageNet and YFCC100M. The resulting model outperforms the current state of the art on standard benchmarks. The paper discusses the challenges of adapting clustering methods to end-to-end training of convnets, noting that traditional clustering methods are designed for linear models on fixed features and struggle when features are learned simultaneously. DeepCluster addresses this by alternating between clustering image descriptors and updating the convnet weights using cluster assignments. It uses k-means for clustering, though other methods like Power Iteration Clustering (PIC) can also be used. The method is robust to changes in architecture, as demonstrated by replacing AlexNet with VGG. It also performs well on ImageNet and YFCC100M datasets, even when trained on an uncured image distribution. The paper evaluates DeepCluster on various tasks, including image classification, object detection, and semantic segmentation, showing its effectiveness in capturing class-level and instance-level information. DeepCluster outperforms previous unsupervised methods on ImageNet and other benchmarks, achieving significant improvements in performance. It is also evaluated on image retrieval tasks, demonstrating its ability to capture instance-level information. The method is compared to self-supervised and generative models, showing its effectiveness in learning general-purpose visual features without requiring labeled data. The paper concludes that DeepCluster is a scalable and effective approach for unsupervised learning of visual features, capable of achieving state-of-the-art performance on various tasks. It is particularly useful for domains where annotations are scarce, as it does not require domain-specific knowledge or labeled data.DeepCluster is a clustering method for unsupervised learning of visual features. It jointly learns the parameters of a neural network and the cluster assignments of the resulting features. The method iteratively groups features using k-means and uses the cluster assignments as pseudo-labels to update the network weights. DeepCluster is applied to the unsupervised training of convolutional neural networks on large datasets like ImageNet and YFCC100M. The resulting model outperforms the current state of the art on standard benchmarks. The paper discusses the challenges of adapting clustering methods to end-to-end training of convnets, noting that traditional clustering methods are designed for linear models on fixed features and struggle when features are learned simultaneously. DeepCluster addresses this by alternating between clustering image descriptors and updating the convnet weights using cluster assignments. It uses k-means for clustering, though other methods like Power Iteration Clustering (PIC) can also be used. The method is robust to changes in architecture, as demonstrated by replacing AlexNet with VGG. It also performs well on ImageNet and YFCC100M datasets, even when trained on an uncured image distribution. The paper evaluates DeepCluster on various tasks, including image classification, object detection, and semantic segmentation, showing its effectiveness in capturing class-level and instance-level information. DeepCluster outperforms previous unsupervised methods on ImageNet and other benchmarks, achieving significant improvements in performance. It is also evaluated on image retrieval tasks, demonstrating its ability to capture instance-level information. The method is compared to self-supervised and generative models, showing its effectiveness in learning general-purpose visual features without requiring labeled data. The paper concludes that DeepCluster is a scalable and effective approach for unsupervised learning of visual features, capable of achieving state-of-the-art performance on various tasks. It is particularly useful for domains where annotations are scarce, as it does not require domain-specific knowledge or labeled data.

Deep Clustering for Unsupervised Learning of Visual Features

18 Mar 2019 | Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze