14 Mar 2024 | Mingya Zhang, Yue Yu, Limei Gu, Tingsheng Lin, Xianping Tao
The paper "VM-UNET-V2: Rethinking Vision Mamba UNet for Medical Image Segmentation" by Mingya Zhang, Yue Yu, Limei Gu, Tingsheng Lin, and Xianping Tao introduces a novel model, VM-UNET-V2, which integrates the strengths of State Space Models (SSMs) and UNet architectures to enhance medical image segmentation. The authors address the limitations of CNNs in capturing long-range dependencies and the quadratic computational complexity of Transformers, which are common issues in medical image segmentation tasks. Inspired by the Mamba architecture, VM-UNET-V2 incorporates Visual State Space (VSS) blocks to capture extensive contextual information and the Semantics and Detail Infusion (SDI) module to integrate low-level and high-level features. Comprehensive experiments on various public datasets, including ISIC17, ISIC18, CVC-300, CVC-ClinicDB, Kvasir, CVC-ColonDB, and ETIS-LaribPolypDB, demonstrate that VM-UNET-V2 achieves competitive performance. The model's efficiency in terms of computational complexity is also highlighted, with superior results in terms of inference speed, GPU memory usage, and floating-point operations (FLOPs). The paper concludes by discussing the model's effectiveness and its potential for further advancements in medical image segmentation.The paper "VM-UNET-V2: Rethinking Vision Mamba UNet for Medical Image Segmentation" by Mingya Zhang, Yue Yu, Limei Gu, Tingsheng Lin, and Xianping Tao introduces a novel model, VM-UNET-V2, which integrates the strengths of State Space Models (SSMs) and UNet architectures to enhance medical image segmentation. The authors address the limitations of CNNs in capturing long-range dependencies and the quadratic computational complexity of Transformers, which are common issues in medical image segmentation tasks. Inspired by the Mamba architecture, VM-UNET-V2 incorporates Visual State Space (VSS) blocks to capture extensive contextual information and the Semantics and Detail Infusion (SDI) module to integrate low-level and high-level features. Comprehensive experiments on various public datasets, including ISIC17, ISIC18, CVC-300, CVC-ClinicDB, Kvasir, CVC-ColonDB, and ETIS-LaribPolypDB, demonstrate that VM-UNET-V2 achieves competitive performance. The model's efficiency in terms of computational complexity is also highlighted, with superior results in terms of inference speed, GPU memory usage, and floating-point operations (FLOPs). The paper concludes by discussing the model's effectiveness and its potential for further advancements in medical image segmentation.