EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba

EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba

15 Mar 2024 | Xiaohuan Pei*, Tao Huang*, and Chang Xu
EfficientVMamba is a lightweight model architecture that integrates atrous-based selective scan and efficient skip sampling to reduce computational complexity while maintaining competitive performance. The model leverages state-space models (SSMs) to capture global information with linear time complexity ($\mathcal{O}(N)$), addressing the trade-off between accuracy and efficiency. EfficientVMamba introduces an efficient 2D scanning (ES2D) method, which reduces the number of tokens scanned in the spatial dimension, and combines it with a convolutional branch to extract local features. The model is designed with global SSM blocks in shallow and high-resolution layers and efficient convolution blocks in deeper layers. Experimental results show that EfficientVMamba reduces computational complexity while achieving significant improvements in accuracy across various vision tasks, such as image classification, object detection, and semantic segmentation. The model variants EfficientVMamba-T, EfficientVMamba-S, and EfficientVMamba-B demonstrate its effectiveness in different scales, with efficient performance and competitive accuracy.EfficientVMamba is a lightweight model architecture that integrates atrous-based selective scan and efficient skip sampling to reduce computational complexity while maintaining competitive performance. The model leverages state-space models (SSMs) to capture global information with linear time complexity ($\mathcal{O}(N)$), addressing the trade-off between accuracy and efficiency. EfficientVMamba introduces an efficient 2D scanning (ES2D) method, which reduces the number of tokens scanned in the spatial dimension, and combines it with a convolutional branch to extract local features. The model is designed with global SSM blocks in shallow and high-resolution layers and efficient convolution blocks in deeper layers. Experimental results show that EfficientVMamba reduces computational complexity while achieving significant improvements in accuracy across various vision tasks, such as image classification, object detection, and semantic segmentation. The model variants EfficientVMamba-T, EfficientVMamba-S, and EfficientVMamba-B demonstrate its effectiveness in different scales, with efficient performance and competitive accuracy.
Reach us at info@study.space
[slides] EfficientVMamba%3A Atrous Selective Scan for Light Weight Visual Mamba | StudySpace