The paper introduces a novel model called High-order Vision Mamba UNet (H-vmunet) for medical image segmentation. This model leverages state-space models (SSMs), particularly 2D-selective-scan (SS2D), to enhance the extraction of global and local features. The key contributions include:
1. **High-order 2D-selective-scan (H-SS2D)**: This module progressively reduces redundant information during SS2D operations through higher-order interactions, maintaining a strong global receptive field while minimizing redundancy.
2. **Local-SS2D module**: Enhances the learning of local features at each order of interaction.
3. **H-vmunet architecture**: Combines the H-SS2D module with the U-Net framework, resulting in a 6-layer structure with U-shaped architecture, including encoder, decoder, and skip-connection parts.
4. **Ablation experiments**: Verify the effectiveness of the proposed modules and operations, showing that H-SS2D significantly improves performance and reduces parameter count.
The model was evaluated on three public medical image datasets (ISIC2017, Spleen, and CVC-ClinicDB) and demonstrated strong competitiveness, reducing parameters by 67.28% compared to traditional Vision Mamba UNet (VM-UNet) while improving segmentation performance. The code for H-vmunet is available on GitHub.The paper introduces a novel model called High-order Vision Mamba UNet (H-vmunet) for medical image segmentation. This model leverages state-space models (SSMs), particularly 2D-selective-scan (SS2D), to enhance the extraction of global and local features. The key contributions include:
1. **High-order 2D-selective-scan (H-SS2D)**: This module progressively reduces redundant information during SS2D operations through higher-order interactions, maintaining a strong global receptive field while minimizing redundancy.
2. **Local-SS2D module**: Enhances the learning of local features at each order of interaction.
3. **H-vmunet architecture**: Combines the H-SS2D module with the U-Net framework, resulting in a 6-layer structure with U-shaped architecture, including encoder, decoder, and skip-connection parts.
4. **Ablation experiments**: Verify the effectiveness of the proposed modules and operations, showing that H-SS2D significantly improves performance and reduces parameter count.
The model was evaluated on three public medical image datasets (ISIC2017, Spleen, and CVC-ClinicDB) and demonstrated strong competitiveness, reducing parameters by 67.28% compared to traditional Vision Mamba UNet (VM-UNet) while improving segmentation performance. The code for H-vmunet is available on GitHub.