21 Nov 2018 | Kaiming He, Ross Girshick, Piotr Dollár
The paper "Rethinking ImageNet Pre-training" by Kaiming He, Ross Girshick, Piotr Dollár, and others from Facebook AI Research (FAIR) challenges the conventional wisdom of using ImageNet pre-training for object detection and instance segmentation tasks. The authors report competitive results on the COCO dataset using standard models trained from random initialization, even when using hyper-parameters optimized for fine-tuning pre-trained models. They find that training from random initialization is surprisingly robust, achieving comparable accuracy to pre-trained models with minimal adjustments. Key findings include:
1. **Robustness to Random Initialization**: Models trained from random initialization can achieve similar accuracy to those pre-trained on ImageNet, even when using the same hyper-parameters optimized for fine-tuning.
2. **Speed of Convergence**: ImageNet pre-training speeds up convergence early in training but does not necessarily improve final task accuracy. Training from random initialization can catch up after a sufficient number of iterations.
3. **Regularization and Task Sensitivity**: ImageNet pre-training does not provide additional regularization benefits and shows no improvement in tasks sensitive to spatial localization.
4. **Data Efficiency**: Training with less data (e.g., 10% of COCO) still yields comparable results, suggesting that large datasets are not always necessary for effective training.
5. **Large Models**: Training large models (up to 4× larger than ResNet-101) from scratch is possible without overfitting, further challenging the need for pre-training.
The authors conclude that ImageNet pre-training is a historical workaround and that collecting and training on target tasks can be a more effective approach, especially when there is a significant gap between the source pre-training task and the target task. They encourage the community to rethink the current paradigm of 'pre-training and fine-tuning' in computer vision.The paper "Rethinking ImageNet Pre-training" by Kaiming He, Ross Girshick, Piotr Dollár, and others from Facebook AI Research (FAIR) challenges the conventional wisdom of using ImageNet pre-training for object detection and instance segmentation tasks. The authors report competitive results on the COCO dataset using standard models trained from random initialization, even when using hyper-parameters optimized for fine-tuning pre-trained models. They find that training from random initialization is surprisingly robust, achieving comparable accuracy to pre-trained models with minimal adjustments. Key findings include:
1. **Robustness to Random Initialization**: Models trained from random initialization can achieve similar accuracy to those pre-trained on ImageNet, even when using the same hyper-parameters optimized for fine-tuning.
2. **Speed of Convergence**: ImageNet pre-training speeds up convergence early in training but does not necessarily improve final task accuracy. Training from random initialization can catch up after a sufficient number of iterations.
3. **Regularization and Task Sensitivity**: ImageNet pre-training does not provide additional regularization benefits and shows no improvement in tasks sensitive to spatial localization.
4. **Data Efficiency**: Training with less data (e.g., 10% of COCO) still yields comparable results, suggesting that large datasets are not always necessary for effective training.
5. **Large Models**: Training large models (up to 4× larger than ResNet-101) from scratch is possible without overfitting, further challenging the need for pre-training.
The authors conclude that ImageNet pre-training is a historical workaround and that collecting and training on target tasks can be a more effective approach, especially when there is a significant gap between the source pre-training task and the target task. They encourage the community to rethink the current paradigm of 'pre-training and fine-tuning' in computer vision.