5 Dec 2018 | Tong He Zhi Zhang Hang Zhang Zhongyue Zhang Junyuan Xie Mu Li
This paper explores a collection of refinements to the training procedures and model architectures of convolutional neural networks (CNNs) to enhance their performance in image classification tasks. The authors evaluate these "tricks" through ablation studies, demonstrating that combining multiple refinements can significantly improve model accuracy. Specifically, they show that applying these tricks to ResNet-50 increases its top-1 validation accuracy from 75.3% to 79.29% on the ImageNet dataset. The improvements are also observed in other networks like Inception V3 and MobileNet, and datasets such as Place365. Additionally, the enhanced models perform better in transfer learning tasks, including object detection and semantic segmentation. The paper discusses various techniques, including large-batch training, low-precision training, model tweaks, and training refinements like cosine learning rate decay, label smoothing, knowledge distillation, and mixup training. The results highlight the effectiveness of these methods in improving both accuracy and computational efficiency.This paper explores a collection of refinements to the training procedures and model architectures of convolutional neural networks (CNNs) to enhance their performance in image classification tasks. The authors evaluate these "tricks" through ablation studies, demonstrating that combining multiple refinements can significantly improve model accuracy. Specifically, they show that applying these tricks to ResNet-50 increases its top-1 validation accuracy from 75.3% to 79.29% on the ImageNet dataset. The improvements are also observed in other networks like Inception V3 and MobileNet, and datasets such as Place365. Additionally, the enhanced models perform better in transfer learning tasks, including object detection and semantic segmentation. The paper discusses various techniques, including large-batch training, low-precision training, model tweaks, and training refinements like cosine learning rate decay, label smoothing, knowledge distillation, and mixup training. The results highlight the effectiveness of these methods in improving both accuracy and computational efficiency.