[slides] Fusion-Mamba for Cross-modality Object Detection

The paper "Fusion-Mamba for Cross-modality Object Detection" addresses the challenge of improving object detection performance by fusing complementary information from different modalities, such as infrared (IR) and visible (RGB) images. The authors propose a novel method called Fusion-Mamba, which uses an improved Mamba framework with a gating mechanism to associate cross-modal features in a hidden state space. This approach aims to reduce modality disparities and enhance the representation consistency of fused features. The Fusion-Mamba method consists of two main components: the State Space Channel Swapping (SSCS) module and the Dual State Space Fusion (DSSF) module. The SSCS module facilitates shallow feature fusion by swapping and enhancing channel features, while the DSSF module enables deep feature fusion in the hidden state space through a gated attention mechanism. These modules work together to reduce modality differences and improve the effectiveness of feature fusion. Extensive experiments on three public RGB-IR object detection datasets (LVIP, $M^3FD$, and FLIR-Aligned) demonstrate that the proposed Fusion-Mamba method outperforms state-of-the-art methods, achieving higher mAP scores with significant improvements. The method also offers better inference efficiency compared to Transformer-based fusion methods, making it a robust and efficient solution for cross-modality object detection.The paper "Fusion-Mamba for Cross-modality Object Detection" addresses the challenge of improving object detection performance by fusing complementary information from different modalities, such as infrared (IR) and visible (RGB) images. The authors propose a novel method called Fusion-Mamba, which uses an improved Mamba framework with a gating mechanism to associate cross-modal features in a hidden state space. This approach aims to reduce modality disparities and enhance the representation consistency of fused features. The Fusion-Mamba method consists of two main components: the State Space Channel Swapping (SSCS) module and the Dual State Space Fusion (DSSF) module. The SSCS module facilitates shallow feature fusion by swapping and enhancing channel features, while the DSSF module enables deep feature fusion in the hidden state space through a gated attention mechanism. These modules work together to reduce modality differences and improve the effectiveness of feature fusion. Extensive experiments on three public RGB-IR object detection datasets (LVIP, $M^3FD$, and FLIR-Aligned) demonstrate that the proposed Fusion-Mamba method outperforms state-of-the-art methods, achieving higher mAP scores with significant improvements. The method also offers better inference efficiency compared to Transformer-based fusion methods, making it a robust and efficient solution for cross-modality object detection.

Fusion-Mamba for Cross-modality Object Detection

14 Apr 2024 | Wenhao Dong, Haodong Zhu, Shaohui Lin, Xiaoyan Luo, Yunhang Shen, Xuhui Liu, Juan Zhang, Guodong Guo, Baochang Zhang