2 May 2018 | Dhruv Mahajan, Ross Girshick, Vignesh Ramanathan, Kaiming He, Manohar Paluri, Yixuan Li, Ashwin Bharambe, Laurens van der Maaten
This paper explores the effectiveness of weakly supervised pretraining using large-scale social media image data for visual perception tasks. The authors train convolutional networks to predict hashtags on billions of Instagram images and demonstrate that this approach leads to strong performance on various tasks, including image classification and object detection. They show that pretraining on such data achieves state-of-the-art results, including a top-1 accuracy of 85.4% on ImageNet-1k, which is significantly higher than results obtained from ImageNet pretraining. The study also reveals that large-scale pretraining is robust to label noise and that the features learned from hashtag prediction allow for effective transfer learning. The authors highlight the importance of aligning the source and target label sets and show that larger hashtag vocabularies can improve performance on tasks with more diverse visual content. They also find that pretraining on large-scale data can lead to improved classification performance but may negatively affect localization tasks. The study underscores the potential of using natural, uncurated data for pretraining and suggests that future research should focus on modifying pretraining tasks to better suit specific target tasks. The results demonstrate that weakly supervised pretraining on large-scale social media data can be as effective as traditional supervised pretraining methods.This paper explores the effectiveness of weakly supervised pretraining using large-scale social media image data for visual perception tasks. The authors train convolutional networks to predict hashtags on billions of Instagram images and demonstrate that this approach leads to strong performance on various tasks, including image classification and object detection. They show that pretraining on such data achieves state-of-the-art results, including a top-1 accuracy of 85.4% on ImageNet-1k, which is significantly higher than results obtained from ImageNet pretraining. The study also reveals that large-scale pretraining is robust to label noise and that the features learned from hashtag prediction allow for effective transfer learning. The authors highlight the importance of aligning the source and target label sets and show that larger hashtag vocabularies can improve performance on tasks with more diverse visual content. They also find that pretraining on large-scale data can lead to improved classification performance but may negatively affect localization tasks. The study underscores the potential of using natural, uncurated data for pretraining and suggests that future research should focus on modifying pretraining tasks to better suit specific target tasks. The results demonstrate that weakly supervised pretraining on large-scale social media data can be as effective as traditional supervised pretraining methods.