4 Aug 2017 | Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta
This paper investigates the impact of large-scale data on visual deep learning, using the JFT-300M dataset, which contains 300 million images with over 375 million noisy labels. The study explores how increasing the size of training data affects performance on vision tasks such as image classification, object detection, semantic segmentation, and human pose estimation. The results show that performance increases logarithmically with the amount of training data, indicating that more data can still provide significant improvements. The paper also demonstrates that representation learning (pretraining) remains a promising approach, as better base models can improve performance on various tasks. Additionally, the study finds that higher capacity models are more effective in utilizing large-scale data. The paper presents new state-of-the-art results for several vision tasks, showing that pretraining on JFT-300M data leads to significant improvements. The findings suggest that data remains a critical factor in visual deep learning, and that efforts should be made to build larger and more diverse datasets. The study also highlights the importance of data quality, as noisy labels can affect performance, and that careful processing is needed to mitigate this issue. Overall, the paper emphasizes the continued importance of data in deep learning and the potential for further improvements through larger and more diverse datasets.This paper investigates the impact of large-scale data on visual deep learning, using the JFT-300M dataset, which contains 300 million images with over 375 million noisy labels. The study explores how increasing the size of training data affects performance on vision tasks such as image classification, object detection, semantic segmentation, and human pose estimation. The results show that performance increases logarithmically with the amount of training data, indicating that more data can still provide significant improvements. The paper also demonstrates that representation learning (pretraining) remains a promising approach, as better base models can improve performance on various tasks. Additionally, the study finds that higher capacity models are more effective in utilizing large-scale data. The paper presents new state-of-the-art results for several vision tasks, showing that pretraining on JFT-300M data leads to significant improvements. The findings suggest that data remains a critical factor in visual deep learning, and that efforts should be made to build larger and more diverse datasets. The study also highlights the importance of data quality, as noisy labels can affect performance, and that careful processing is needed to mitigate this issue. Overall, the paper emphasizes the continued importance of data in deep learning and the potential for further improvements through larger and more diverse datasets.