Understanding MambaDFuse%3A A Mamba-based Dual-phase Model for Multi-modality Image Fusion

**MambaDFuse: A Mamba-based Dual-phase Model for Multi-modality Image Fusion** **Authors:** Zhe Li **Abstract:** Multi-modality image fusion (MMIF) aims to integrate complementary information from different modalities into a single fused image to enhance visual tasks. While significant progress has been made in MMIF using deep neural networks, existing methods often struggle with local reductive bias or quadratic computational complexity. To address these issues, the paper proposes MambaDFuse, a Mamba-based dual-phase model for MMIF. The model consists of three main components: a dual-level feature extractor, a dual-phase feature fusion module, and a fused image reconstruction module. The dual-level feature extractor captures long-range features from single-modality images using CNNs and Mamba blocks. The dual-phase feature fusion module combines complementary information from different modalities using channel exchange for shallow fusion and enhanced Multi-modal Mamba (M3) blocks for deep fusion. The fused image reconstruction module generates the final fused image through inverse transformation. Extensive experiments demonstrate that MambaDFuse achieves promising results in infrared-visible image fusion and medical image fusion, outperforming state-of-the-art methods in both subjective visual assessment and objective evaluation metrics. Additionally, MambaDFuse shows improved performance in downstream tasks such as object detection. **Keywords:** Image fusion, Mamba, multi-modality images, cross-modality interaction**MambaDFuse: A Mamba-based Dual-phase Model for Multi-modality Image Fusion** **Authors:** Zhe Li **Abstract:** Multi-modality image fusion (MMIF) aims to integrate complementary information from different modalities into a single fused image to enhance visual tasks. While significant progress has been made in MMIF using deep neural networks, existing methods often struggle with local reductive bias or quadratic computational complexity. To address these issues, the paper proposes MambaDFuse, a Mamba-based dual-phase model for MMIF. The model consists of three main components: a dual-level feature extractor, a dual-phase feature fusion module, and a fused image reconstruction module. The dual-level feature extractor captures long-range features from single-modality images using CNNs and Mamba blocks. The dual-phase feature fusion module combines complementary information from different modalities using channel exchange for shallow fusion and enhanced Multi-modal Mamba (M3) blocks for deep fusion. The fused image reconstruction module generates the final fused image through inverse transformation. Extensive experiments demonstrate that MambaDFuse achieves promising results in infrared-visible image fusion and medical image fusion, outperforming state-of-the-art methods in both subjective visual assessment and objective evaluation metrics. Additionally, MambaDFuse shows improved performance in downstream tasks such as object detection. **Keywords:** Image fusion, Mamba, multi-modality images, cross-modality interaction

MambaDFuse: A Mamba-based Dual-phase Model for Multi-modality Image Fusion

12 Apr 2024 | Zhe Li, Haiwei Pan*, Kejia Zhang, Yuhua Wang, Fengming Yu