28 Nov 2018 | Xizhou Zhu, Han Hu, Stephen Lin, Jifeng Dai
The paper "Deformable ConvNets v2: More Deformable, Better Results" by Xizhou Zhu, Han Hu, Stephen Lin, and Jifeng Dai addresses the limitations of Deformable Convolutional Networks (DCNv1) in object detection and instance segmentation. DCNv1 enhances the ability of CNNs to adapt to geometric variations but often extends its spatial support beyond the region of interest, leading to irrelevant feature influence. To improve this, the authors propose DCNv2, which introduces two main enhancements:
1. **Enhanced Modeling Power**: DCNv2 increases the use of deformable convolution layers within the network, allowing for broader control over sampling and pooling patterns. This is achieved by stacking more deformable convolution layers in the network, particularly in the conv3-conv5 stages.
2. **Modulation Mechanism**: A modulation mechanism is introduced to allow each sample to be modulated by a learned feature amplitude, in addition to being offset. This provides more flexibility in adjusting the spatial distribution and relative influence of samples.
To effectively utilize these enhanced capabilities, the authors propose a feature mimicking scheme inspired by knowledge distillation. This scheme guides the network training by making it learn features similar to those produced by R-CNN, which are known for their ability to focus on relevant image regions. The feature mimicking loss is enforced only on positive RoI heads that overlap sufficiently with ground-truth objects.
Experiments on the COCO benchmark demonstrate significant performance improvements over DCNv1, with DCNv2 achieving leading results in object detection and instance segmentation. The paper also includes ablation studies and comparisons with regular ConvNets and stronger backbones, showing that DCNv2 outperforms all models in terms of accuracy and efficiency.The paper "Deformable ConvNets v2: More Deformable, Better Results" by Xizhou Zhu, Han Hu, Stephen Lin, and Jifeng Dai addresses the limitations of Deformable Convolutional Networks (DCNv1) in object detection and instance segmentation. DCNv1 enhances the ability of CNNs to adapt to geometric variations but often extends its spatial support beyond the region of interest, leading to irrelevant feature influence. To improve this, the authors propose DCNv2, which introduces two main enhancements:
1. **Enhanced Modeling Power**: DCNv2 increases the use of deformable convolution layers within the network, allowing for broader control over sampling and pooling patterns. This is achieved by stacking more deformable convolution layers in the network, particularly in the conv3-conv5 stages.
2. **Modulation Mechanism**: A modulation mechanism is introduced to allow each sample to be modulated by a learned feature amplitude, in addition to being offset. This provides more flexibility in adjusting the spatial distribution and relative influence of samples.
To effectively utilize these enhanced capabilities, the authors propose a feature mimicking scheme inspired by knowledge distillation. This scheme guides the network training by making it learn features similar to those produced by R-CNN, which are known for their ability to focus on relevant image regions. The feature mimicking loss is enforced only on positive RoI heads that overlap sufficiently with ground-truth objects.
Experiments on the COCO benchmark demonstrate significant performance improvements over DCNv1, with DCNv2 achieving leading results in object detection and instance segmentation. The paper also includes ablation studies and comparisons with regular ConvNets and stronger backbones, showing that DCNv2 outperforms all models in terms of accuracy and efficiency.