GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond

GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond

25 Apr 2019 | Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, Han Hu
The paper "GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond" addresses the issue of long-range dependency modeling in visual recognition tasks. The authors start by analyzing the Non-Local Network (NLNet), which aims to capture global context by aggregating query-specific global contexts. However, they find that the global contexts modeled by NLNet are nearly identical for different query positions within an image. This observation leads to a simplified version of NLNet, which uses a query-independent attention map, significantly reducing computational cost while maintaining accuracy. The authors further observe that this simplified design shares similarities with the Squeeze-Excitation Network (SENet). They unify these two approaches into a three-step general framework for global context modeling: context modeling, feature transformation, and fusion. Within this framework, they design a lightweight global context (GC) block, which effectively models global context and can be applied to multiple layers in a backbone network to construct a global context network (GCNet). GCNet outperforms both simplified NLNet and SENet on major benchmarks for various recognition tasks, including object detection, instance segmentation, image classification, and action recognition. The code and configurations are released on GitHub for further research.The paper "GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond" addresses the issue of long-range dependency modeling in visual recognition tasks. The authors start by analyzing the Non-Local Network (NLNet), which aims to capture global context by aggregating query-specific global contexts. However, they find that the global contexts modeled by NLNet are nearly identical for different query positions within an image. This observation leads to a simplified version of NLNet, which uses a query-independent attention map, significantly reducing computational cost while maintaining accuracy. The authors further observe that this simplified design shares similarities with the Squeeze-Excitation Network (SENet). They unify these two approaches into a three-step general framework for global context modeling: context modeling, feature transformation, and fusion. Within this framework, they design a lightweight global context (GC) block, which effectively models global context and can be applied to multiple layers in a backbone network to construct a global context network (GCNet). GCNet outperforms both simplified NLNet and SENet on major benchmarks for various recognition tasks, including object detection, instance segmentation, image classification, and action recognition. The code and configurations are released on GitHub for further research.
Reach us at info@study.space