4 Aug 2017 | Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta
The paper "Revisiting Unreasonable Effectiveness of Data in Deep Learning Era" by Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta explores the relationship between large datasets and the performance of deep learning models in computer vision tasks. Despite significant advancements in model capacity and computational power, the size of the largest datasets has remained constant since 2012. The authors investigate the impact of increasing the dataset size by 10 times or 100 times using the JFT-300M dataset, which contains over 375 million noisy labels for 300 million images.
Key findings include:
1. **Performance Increase**: The performance on vision tasks increases logarithmically with the volume of training data.
2. **Representation Learning**: Representation learning (or pre-training) remains crucial and can significantly improve performance on various vision tasks.
3. **Model Capacity**: Higher-capacity models are better at utilizing large datasets.
4. **Long-tail Data**: The long-tail distribution of the JFT-300M dataset does not negatively affect the performance of ConvNets.
5. **State-of-the-Art Results**: The paper presents new state-of-the-art results for image classification, object detection, semantic segmentation, and human pose estimation using models trained on the JFT-300M dataset.
The authors argue that the field should not undervalue the importance of data and should collectively work on building larger datasets to further advance deep learning in computer vision.The paper "Revisiting Unreasonable Effectiveness of Data in Deep Learning Era" by Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta explores the relationship between large datasets and the performance of deep learning models in computer vision tasks. Despite significant advancements in model capacity and computational power, the size of the largest datasets has remained constant since 2012. The authors investigate the impact of increasing the dataset size by 10 times or 100 times using the JFT-300M dataset, which contains over 375 million noisy labels for 300 million images.
Key findings include:
1. **Performance Increase**: The performance on vision tasks increases logarithmically with the volume of training data.
2. **Representation Learning**: Representation learning (or pre-training) remains crucial and can significantly improve performance on various vision tasks.
3. **Model Capacity**: Higher-capacity models are better at utilizing large datasets.
4. **Long-tail Data**: The long-tail distribution of the JFT-300M dataset does not negatively affect the performance of ConvNets.
5. **State-of-the-Art Results**: The paper presents new state-of-the-art results for image classification, object detection, semantic segmentation, and human pose estimation using models trained on the JFT-300M dataset.
The authors argue that the field should not undervalue the importance of data and should collectively work on building larger datasets to further advance deep learning in computer vision.