VM-UNet: Vision Mamba UNet for Medical Image Segmentation

VM-UNet: Vision Mamba UNet for Medical Image Segmentation

4 Feb 2024 | Jiacheng Ruan, Suncheng Xiang
VM-UNet is a novel model for medical image segmentation based on State Space Models (SSMs), specifically Mamba. It addresses the limitations of CNNs and Transformers in medical image segmentation by leveraging the long-range modeling capabilities and linear computational complexity of SSMs. VM-UNet is designed as a U-shaped architecture with a Visual State Space (VSS) block as the core component, enabling the capture of extensive contextual information. The model features an asymmetrical encoder-decoder structure, with the encoder using VSS blocks and patch merging operations for feature extraction, and the decoder using VSS blocks and patch expanding operations for feature restoration. Skip connections are implemented using a simple additive operation to enhance segmentation performance. Comprehensive experiments on the ISIC17, ISIC18, and Synapse datasets demonstrate that VM-UNet achieves competitive performance in medical image segmentation tasks. The results show that VM-UNet outperforms other models in terms of metrics such as mIoU, DSC, and Acc. VM-UNet represents the most basic form of a pure SSM-based segmentation model, as it does not include any specially designed modules. The model's performance is evaluated using Binary Cross-Entropy and Dice loss (BceDice loss) for binary segmentation and Cross-Entropy and Dice loss (CeDice loss) for multi-class segmentation. The paper also presents ablation studies showing that more potent pretrained weights significantly enhance the performance of VM-UNet. The model is initialized with the pretrained weights from VMamba-S, which is pre-trained on ImageNet-1k. VM-UNet is trained on a single NVIDIA RTX A6000 GPU, with a batch size of 32 and AdamW optimizer with a learning rate of 1×10⁻³. The model achieves a high level of performance in medical image segmentation tasks, demonstrating the potential of SSM-based models in this domain. Future work includes exploring better segmentation modules based on SSM mechanisms, optimizing SSMs for real-world applications, investigating segmentation performance at higher resolutions, and applying SSMs to other medical imaging tasks.VM-UNet is a novel model for medical image segmentation based on State Space Models (SSMs), specifically Mamba. It addresses the limitations of CNNs and Transformers in medical image segmentation by leveraging the long-range modeling capabilities and linear computational complexity of SSMs. VM-UNet is designed as a U-shaped architecture with a Visual State Space (VSS) block as the core component, enabling the capture of extensive contextual information. The model features an asymmetrical encoder-decoder structure, with the encoder using VSS blocks and patch merging operations for feature extraction, and the decoder using VSS blocks and patch expanding operations for feature restoration. Skip connections are implemented using a simple additive operation to enhance segmentation performance. Comprehensive experiments on the ISIC17, ISIC18, and Synapse datasets demonstrate that VM-UNet achieves competitive performance in medical image segmentation tasks. The results show that VM-UNet outperforms other models in terms of metrics such as mIoU, DSC, and Acc. VM-UNet represents the most basic form of a pure SSM-based segmentation model, as it does not include any specially designed modules. The model's performance is evaluated using Binary Cross-Entropy and Dice loss (BceDice loss) for binary segmentation and Cross-Entropy and Dice loss (CeDice loss) for multi-class segmentation. The paper also presents ablation studies showing that more potent pretrained weights significantly enhance the performance of VM-UNet. The model is initialized with the pretrained weights from VMamba-S, which is pre-trained on ImageNet-1k. VM-UNet is trained on a single NVIDIA RTX A6000 GPU, with a batch size of 32 and AdamW optimizer with a learning rate of 1×10⁻³. The model achieves a high level of performance in medical image segmentation tasks, demonstrating the potential of SSM-based models in this domain. Future work includes exploring better segmentation modules based on SSM mechanisms, optimizing SSMs for real-world applications, investigating segmentation performance at higher resolutions, and applying SSMs to other medical imaging tasks.
Reach us at info@study.space