28 May 2017 | Fisher Yu, Vladlen Koltun, Thomas Funkhouser
Dilated Residual Networks (DRNs) improve image classification and downstream tasks like object localization and semantic segmentation without increasing model depth or complexity. Traditional convolutional networks reduce image resolution, losing spatial information that can hinder accuracy and transfer to tasks requiring detailed scene understanding. DRNs use dilation to maintain high spatial resolution while preserving the receptive field of neurons, allowing for better spatial awareness.
DRNs are constructed by replacing some subsampling layers in residual networks with dilated convolutions. This increases output resolution without reducing receptive field, leading to improved classification accuracy. For example, DRNs achieve higher accuracy on ImageNet compared to non-dilated counterparts. The output resolution of DRNs on typical ImageNet input is 28×28, comparable to small thumbnails that convey image structure.
Gridding artifacts, introduced by dilation, are addressed through a 'degridding' process that further improves DRN performance. DRNs also outperform ResNet models in downstream tasks such as object localization and semantic segmentation. For instance, a 42-layer DRN outperforms a ResNet-101 baseline on the Cityscapes dataset by more than 4 percentage points, despite lower depth.
DRNs can be directly used for weakly-supervised object localization and semantic segmentation without additional training. They produce high-resolution activation maps that enable accurate object localization and segmentation. DRN-C-26, a degridded version, outperforms deeper models like DRN-A-50 and ResNet-101 in these tasks.
The results show that DRNs are effective for image analysis tasks involving complex natural images, particularly when detailed scene understanding is required. The approach is supported by code and pretrained models for future research and applications.Dilated Residual Networks (DRNs) improve image classification and downstream tasks like object localization and semantic segmentation without increasing model depth or complexity. Traditional convolutional networks reduce image resolution, losing spatial information that can hinder accuracy and transfer to tasks requiring detailed scene understanding. DRNs use dilation to maintain high spatial resolution while preserving the receptive field of neurons, allowing for better spatial awareness.
DRNs are constructed by replacing some subsampling layers in residual networks with dilated convolutions. This increases output resolution without reducing receptive field, leading to improved classification accuracy. For example, DRNs achieve higher accuracy on ImageNet compared to non-dilated counterparts. The output resolution of DRNs on typical ImageNet input is 28×28, comparable to small thumbnails that convey image structure.
Gridding artifacts, introduced by dilation, are addressed through a 'degridding' process that further improves DRN performance. DRNs also outperform ResNet models in downstream tasks such as object localization and semantic segmentation. For instance, a 42-layer DRN outperforms a ResNet-101 baseline on the Cityscapes dataset by more than 4 percentage points, despite lower depth.
DRNs can be directly used for weakly-supervised object localization and semantic segmentation without additional training. They produce high-resolution activation maps that enable accurate object localization and segmentation. DRN-C-26, a degridded version, outperforms deeper models like DRN-A-50 and ResNet-101 in these tasks.
The results show that DRNs are effective for image analysis tasks involving complex natural images, particularly when detailed scene understanding is required. The approach is supported by code and pretrained models for future research and applications.