25 Apr 2019 | Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, Han Hu
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond
GCNet is a novel backbone architecture that improves global context modeling for various visual recognition tasks. The paper introduces a simplified version of the non-local network (NLNet), which captures long-range dependencies by aggregating global context for each query position. However, the authors found that the global contexts modeled by NLNet are almost the same for different query positions, leading to a simplified network that maintains accuracy with significantly less computation. This simplified design shares a similar structure with the Squeeze-Excitation Network (SENet), and the two are unified into a three-step general framework for global context modeling.
The authors propose a new instantiation of this framework, called the global context (GC) block, which is lightweight and can effectively model global context. The GC block is applied to multiple layers in a backbone network to construct a global context network (GCNet), which generally outperforms both simplified NLNet and SENet on major benchmarks for various recognition tasks.
The GC block is lightweight, allowing it to be applied to all residual blocks in the ResNet architecture. GCNet outperforms NLNet and SENet on COCO object detection/segmentation, image classification on ImageNet, and action recognition on Kinetics with only a small increase in computation cost. The GC block is also effective in capturing long-range dependencies and aids network training.
The paper also presents an ablation study on the GC block, showing that it performs well on various tasks and is effective in capturing global context. The GC block is applied to stronger backbones, such as ResNet-101 and ResNeXt-101, and still achieves significant improvements in performance. The GC block is also effective in action recognition on the Kinetics dataset.
The paper concludes that the GCNet is a novel and effective approach for global context modeling, outperforming both simplified NLNet and SENet on various tasks. The GC block is lightweight and can be applied to multiple layers in a backbone network, making it a promising approach for visual recognition tasks.GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond
GCNet is a novel backbone architecture that improves global context modeling for various visual recognition tasks. The paper introduces a simplified version of the non-local network (NLNet), which captures long-range dependencies by aggregating global context for each query position. However, the authors found that the global contexts modeled by NLNet are almost the same for different query positions, leading to a simplified network that maintains accuracy with significantly less computation. This simplified design shares a similar structure with the Squeeze-Excitation Network (SENet), and the two are unified into a three-step general framework for global context modeling.
The authors propose a new instantiation of this framework, called the global context (GC) block, which is lightweight and can effectively model global context. The GC block is applied to multiple layers in a backbone network to construct a global context network (GCNet), which generally outperforms both simplified NLNet and SENet on major benchmarks for various recognition tasks.
The GC block is lightweight, allowing it to be applied to all residual blocks in the ResNet architecture. GCNet outperforms NLNet and SENet on COCO object detection/segmentation, image classification on ImageNet, and action recognition on Kinetics with only a small increase in computation cost. The GC block is also effective in capturing long-range dependencies and aids network training.
The paper also presents an ablation study on the GC block, showing that it performs well on various tasks and is effective in capturing global context. The GC block is applied to stronger backbones, such as ResNet-101 and ResNeXt-101, and still achieves significant improvements in performance. The GC block is also effective in action recognition on the Kinetics dataset.
The paper concludes that the GCNet is a novel and effective approach for global context modeling, outperforming both simplified NLNet and SENet on various tasks. The GC block is lightweight and can be applied to multiple layers in a backbone network, making it a promising approach for visual recognition tasks.