22 Apr 2024 | Zhenhua Liu1*, Zhiwei Hao2*, Kai Han1**, Yehui Tang1, and Yunhe Wang1**
This paper explores training strategies for compact neural networks, specifically focusing on the GhostNetV3 model. The authors investigate the impact of re-parameterization, knowledge distillation, learning schedules, and data augmentations on the performance of compact models. They find that re-parameterization, particularly with a 1×1 depth-wise convolution, significantly improves performance. Knowledge distillation, using a well-performing teacher model, also enhances accuracy. The proposed training strategy is applied to various architectures, including GhostNetV3, MobileNetV2, and ShuffleNetV2, achieving significant improvements in top-1 accuracy and latency. The results demonstrate that the proposed strategy can effectively balance accuracy and inference speed, making compact models more suitable for edge devices. The paper also extends these findings to object detection tasks, further validating the effectiveness of the proposed training methods.This paper explores training strategies for compact neural networks, specifically focusing on the GhostNetV3 model. The authors investigate the impact of re-parameterization, knowledge distillation, learning schedules, and data augmentations on the performance of compact models. They find that re-parameterization, particularly with a 1×1 depth-wise convolution, significantly improves performance. Knowledge distillation, using a well-performing teacher model, also enhances accuracy. The proposed training strategy is applied to various architectures, including GhostNetV3, MobileNetV2, and ShuffleNetV2, achieving significant improvements in top-1 accuracy and latency. The results demonstrate that the proposed strategy can effectively balance accuracy and inference speed, making compact models more suitable for edge devices. The paper also extends these findings to object detection tasks, further validating the effectiveness of the proposed training methods.