Aggregated Residual Transformations for Deep Neural Networks

Aggregated Residual Transformations for Deep Neural Networks

11 Apr 2017 | Saining Xie1 Ross Girshick2 Piotr Dollár2 Zhuowen Tu1 Kaiming He2
This paper introduces ResNeXt, a novel deep neural network architecture for image classification. ResNeXt is built by repeating a building block that aggregates a set of transformations with the same topology. The architecture is simple, modular, and has only a few hyper-parameters to set. The key innovation is the introduction of "cardinality," which refers to the size of the set of transformations, as an essential dimension in addition to depth and width. Experiments on the ImageNet-1K dataset show that increasing cardinality improves classification accuracy, even under the constraint of maintaining complexity. ResNeXt outperforms ResNet and Inception models on ImageNet and COCO datasets. The architecture is also shown to be more effective than increasing depth or width in improving accuracy. ResNeXt is the foundation of the authors' entry to the ILSVRC 2016 classification task, where they secured second place. The code and models are publicly available. ResNeXt is also evaluated on the ImageNet-5K dataset and COCO detection set, showing better results than its ResNet counterpart. The paper also discusses related work, including multi-branch convolutional networks, grouped convolutions, and ensembling. The method is based on a highly modularized design, with a template that allows for easy extension to any number of transformations. The architecture is shown to have stronger representational power than ResNet and Inception models, and is more effective than increasing depth or width in improving accuracy. The paper also discusses implementation details, experiments on ImageNet-1K, ImageNet-5K, and CIFAR datasets, and object detection on the COCO dataset. The results show that ResNeXt achieves state-of-the-art performance on these tasks.This paper introduces ResNeXt, a novel deep neural network architecture for image classification. ResNeXt is built by repeating a building block that aggregates a set of transformations with the same topology. The architecture is simple, modular, and has only a few hyper-parameters to set. The key innovation is the introduction of "cardinality," which refers to the size of the set of transformations, as an essential dimension in addition to depth and width. Experiments on the ImageNet-1K dataset show that increasing cardinality improves classification accuracy, even under the constraint of maintaining complexity. ResNeXt outperforms ResNet and Inception models on ImageNet and COCO datasets. The architecture is also shown to be more effective than increasing depth or width in improving accuracy. ResNeXt is the foundation of the authors' entry to the ILSVRC 2016 classification task, where they secured second place. The code and models are publicly available. ResNeXt is also evaluated on the ImageNet-5K dataset and COCO detection set, showing better results than its ResNet counterpart. The paper also discusses related work, including multi-branch convolutional networks, grouped convolutions, and ensembling. The method is based on a highly modularized design, with a template that allows for easy extension to any number of transformations. The architecture is shown to have stronger representational power than ResNet and Inception models, and is more effective than increasing depth or width in improving accuracy. The paper also discusses implementation details, experiments on ImageNet-1K, ImageNet-5K, and CIFAR datasets, and object detection on the COCO dataset. The results show that ResNeXt achieves state-of-the-art performance on these tasks.
Reach us at info@study.space
Understanding Aggregated Residual Transformations for Deep Neural Networks