LocalMamba: Visual State Space Model with Windowed Selective Scan

LocalMamba: Visual State Space Model with Windowed Selective Scan

14 Mar 2024 | Tao Huang, Xiaohuan Pei, Shan You, Fei Wang, Chen Qian, and Chang Xu
LocalMamba is a novel visual state space model that enhances the ability to capture local dependencies in images while maintaining global context. The model introduces a local scanning strategy that divides images into distinct windows, enabling efficient capture of local dependencies. Additionally, it proposes a dynamic method to independently search for optimal scan directions for each layer, significantly improving performance. Extensive experiments on various tasks, including image classification, object detection, and semantic segmentation, demonstrate that LocalMamba outperforms traditional CNNs and Vision Transformers (ViTs). For example, LocalMamba achieves a 3.1% improvement over Vim-Ti on ImageNet with the same 1.5G FLOPs. The model also incorporates a spatial and channel attention module (SCAttn) to enhance feature aggregation and a scan direction search mechanism to optimize scanning configurations. These innovations make LocalMamba a powerful and efficient approach for visual tasks.LocalMamba is a novel visual state space model that enhances the ability to capture local dependencies in images while maintaining global context. The model introduces a local scanning strategy that divides images into distinct windows, enabling efficient capture of local dependencies. Additionally, it proposes a dynamic method to independently search for optimal scan directions for each layer, significantly improving performance. Extensive experiments on various tasks, including image classification, object detection, and semantic segmentation, demonstrate that LocalMamba outperforms traditional CNNs and Vision Transformers (ViTs). For example, LocalMamba achieves a 3.1% improvement over Vim-Ti on ImageNet with the same 1.5G FLOPs. The model also incorporates a spatial and channel attention module (SCAttn) to enhance feature aggregation and a scan direction search mechanism to optimize scanning configurations. These innovations make LocalMamba a powerful and efficient approach for visual tasks.
Reach us at info@study.space