Understanding Exploring the Limits of Weakly Supervised Pretraining

This paper explores the limits of weakly supervised pretraining by training convolutional networks to predict hashtags on billions of social media images. The study demonstrates that large-scale hashtag prediction leads to excellent results in image classification and object detection tasks. The authors achieve the highest ImageNet-1k single-crop top-1 accuracy to date at 85.4% and report a top-5 accuracy of 97.6%. They also find that "hashtag engineering" and training on large-scale hashtag data are promising directions for improving transfer learning performance. The paper provides extensive experimental data on the relationship between large-scale pretraining and transfer learning, highlighting the importance of selecting appropriate label spaces and increasing visual variety in benchmark tasks. The results suggest that current network architectures may underfit when trained on billions of images, and that further increases in model capacity can improve performance. Additionally, the study shows that training for large-scale hashtag prediction improves classification while potentially harming localization performance in object detection tasks.This paper explores the limits of weakly supervised pretraining by training convolutional networks to predict hashtags on billions of social media images. The study demonstrates that large-scale hashtag prediction leads to excellent results in image classification and object detection tasks. The authors achieve the highest ImageNet-1k single-crop top-1 accuracy to date at 85.4% and report a top-5 accuracy of 97.6%. They also find that "hashtag engineering" and training on large-scale hashtag data are promising directions for improving transfer learning performance. The paper provides extensive experimental data on the relationship between large-scale pretraining and transfer learning, highlighting the importance of selecting appropriate label spaces and increasing visual variety in benchmark tasks. The results suggest that current network architectures may underfit when trained on billions of images, and that further increases in model capacity can improve performance. Additionally, the study shows that training for large-scale hashtag prediction improves classification while potentially harming localization performance in object detection tasks.

Exploring the Limits of Weakly Supervised Pretraining

2 May 2018 | Dhruv Mahajan, Ross Girshick, Vignesh Ramanathan, Kaiming He, Manohar Paluri, Yixuan Li, Ashwin Bharambe, Laurens van der Maaten