Rethinking ImageNet Pre-training

Rethinking ImageNet Pre-training

21 Nov 2018 | Kaiming He, Ross Girshick, Piotr Dollár
This paper challenges the conventional wisdom that ImageNet pre-training is essential for achieving high performance in computer vision tasks like object detection and instance segmentation. The authors demonstrate that models trained from random initialization without ImageNet pre-training can achieve competitive results on the COCO dataset, often matching or exceeding the performance of models that use ImageNet pre-training. This is particularly true when using standard models and training schedules optimized for fine-tuning pre-trained models. The results hold even when using only 10% of the training data, for deeper and wider models, and across multiple tasks and metrics. ImageNet pre-training speeds up convergence early in training but does not necessarily improve final task accuracy or provide regularization. The paper shows that training from scratch can be effective when using appropriate normalization techniques and sufficient training iterations. For example, models trained from scratch with ResNet-50 and GroupNorm can achieve 50.9 AP on COCO object detection without any external data, matching the top results from the COCO 2017 competition that used ImageNet pre-training. The study also finds that ImageNet pre-training does not benefit tasks that are more sensitive to spatially localized predictions, such as keypoint detection. Training from scratch can achieve comparable performance to pre-trained models in these cases without additional regularization. Furthermore, large models trained from scratch, such as those using ResNeXt-152, can achieve high performance without overfitting, even when trained on a smaller dataset. The paper argues that the current paradigm of pre-training and fine-tuning may not be necessary for all tasks, especially when sufficient data and computational resources are available. It suggests that training directly on the target task can be a viable alternative, particularly when there is a significant gap between the pre-training task and the target task. The findings challenge the assumption that ImageNet pre-training is a fundamental component of computer vision and encourage a reevaluation of the role of pre-training in the field.This paper challenges the conventional wisdom that ImageNet pre-training is essential for achieving high performance in computer vision tasks like object detection and instance segmentation. The authors demonstrate that models trained from random initialization without ImageNet pre-training can achieve competitive results on the COCO dataset, often matching or exceeding the performance of models that use ImageNet pre-training. This is particularly true when using standard models and training schedules optimized for fine-tuning pre-trained models. The results hold even when using only 10% of the training data, for deeper and wider models, and across multiple tasks and metrics. ImageNet pre-training speeds up convergence early in training but does not necessarily improve final task accuracy or provide regularization. The paper shows that training from scratch can be effective when using appropriate normalization techniques and sufficient training iterations. For example, models trained from scratch with ResNet-50 and GroupNorm can achieve 50.9 AP on COCO object detection without any external data, matching the top results from the COCO 2017 competition that used ImageNet pre-training. The study also finds that ImageNet pre-training does not benefit tasks that are more sensitive to spatially localized predictions, such as keypoint detection. Training from scratch can achieve comparable performance to pre-trained models in these cases without additional regularization. Furthermore, large models trained from scratch, such as those using ResNeXt-152, can achieve high performance without overfitting, even when trained on a smaller dataset. The paper argues that the current paradigm of pre-training and fine-tuning may not be necessary for all tasks, especially when sufficient data and computational resources are available. It suggests that training directly on the target task can be a viable alternative, particularly when there is a significant gap between the pre-training task and the target task. The findings challenge the assumption that ImageNet pre-training is a fundamental component of computer vision and encourage a reevaluation of the role of pre-training in the field.
Reach us at info@study.space
[slides] Rethinking ImageNet Pre-Training | StudySpace