Wider or Deeper: Revisiting the ResNet Model for Visual Recognition

Wider or Deeper: Revisiting the ResNet Model for Visual Recognition

30 Nov 2016 | Zifeng Wu, Chunhua Shen, and Anton van den Hengel
The paper "Wider or Deeper: Revisiting the ResNet Model for Visual Recognition" by Zifeng Wu, Chunhua Shen, and Anton van den Hengel explores the effectiveness of deep residual networks (ResNets) and proposes a new, shallower architecture that outperforms deeper models like ResNet-200 on the ImageNet classification dataset. The authors argue that increasing depth alone may not be the best strategy for improving performance, as it can lead to overfitting and increased computational costs. They introduce a new interpretation of ResNets, suggesting that they operate as an ensemble of relatively shallow networks rather than a single deep network. This interpretation helps explain the behaviors observed in experimental results and leads to the development of a new architecture that is both more efficient in memory use and faster in training. The proposed architecture is evaluated on various datasets, including ImageNet, PASCAL VOC, PASCAL Context, and Cityscapes, where it achieves state-of-the-art performance in semantic image segmentation tasks. The paper also discusses the trade-offs between width and depth in network design and provides detailed experimental results to support its findings.The paper "Wider or Deeper: Revisiting the ResNet Model for Visual Recognition" by Zifeng Wu, Chunhua Shen, and Anton van den Hengel explores the effectiveness of deep residual networks (ResNets) and proposes a new, shallower architecture that outperforms deeper models like ResNet-200 on the ImageNet classification dataset. The authors argue that increasing depth alone may not be the best strategy for improving performance, as it can lead to overfitting and increased computational costs. They introduce a new interpretation of ResNets, suggesting that they operate as an ensemble of relatively shallow networks rather than a single deep network. This interpretation helps explain the behaviors observed in experimental results and leads to the development of a new architecture that is both more efficient in memory use and faster in training. The proposed architecture is evaluated on various datasets, including ImageNet, PASCAL VOC, PASCAL Context, and Cityscapes, where it achieves state-of-the-art performance in semantic image segmentation tasks. The paper also discusses the trade-offs between width and depth in network design and provides detailed experimental results to support its findings.
Reach us at info@study.space
[slides and audio] Wider or Deeper%3A Revisiting the ResNet Model for Visual Recognition