VOL. 43, NO. 2, FEB. 2021 | Shang-Hua Gao*, Ming-Ming Cheng*, Kai Zhao, Xin-Yu Zhang, Ming-Hsuan Yang, and Philip Torr
Res2Net is a novel multi-scale backbone architecture for convolutional neural networks (CNNs) that enhances multi-scale feature representation. The architecture introduces hierarchical residual-like connections within a single residual block, enabling granular multi-scale feature representation and expanding the receptive field of each network layer. The Res2Net block can be integrated into existing state-of-the-art CNN models such as ResNet, ResNeXt, and DLA. Experimental results show consistent performance improvements on widely-used datasets like CIFAR-100 and ImageNet. Ablation studies and experiments on tasks such as object detection, class activation mapping, and salient object detection further confirm the superiority of Res2Net over existing methods. The Res2Net module is orthogonal to existing dimensions like depth, width, and cardinality, and increasing the scale dimension leads to more effective performance gains. The Res2Net module is integrated with modern modules such as the dimension cardinality and SE block, and can be applied to various tasks including image classification, object detection, semantic segmentation, and salient object detection. The Res2Net module has been tested on multiple datasets and has shown significant improvements in performance, particularly in multi-scale representation. The source code and trained models are available for further research.Res2Net is a novel multi-scale backbone architecture for convolutional neural networks (CNNs) that enhances multi-scale feature representation. The architecture introduces hierarchical residual-like connections within a single residual block, enabling granular multi-scale feature representation and expanding the receptive field of each network layer. The Res2Net block can be integrated into existing state-of-the-art CNN models such as ResNet, ResNeXt, and DLA. Experimental results show consistent performance improvements on widely-used datasets like CIFAR-100 and ImageNet. Ablation studies and experiments on tasks such as object detection, class activation mapping, and salient object detection further confirm the superiority of Res2Net over existing methods. The Res2Net module is orthogonal to existing dimensions like depth, width, and cardinality, and increasing the scale dimension leads to more effective performance gains. The Res2Net module is integrated with modern modules such as the dimension cardinality and SE block, and can be applied to various tasks including image classification, object detection, semantic segmentation, and salient object detection. The Res2Net module has been tested on multiple datasets and has shown significant improvements in performance, particularly in multi-scale representation. The source code and trained models are available for further research.