The paper introduces VM-UNet, a pure State Space Model (SSM)-based architecture for medical image segmentation. VM-UNet is designed to address the limitations of both CNN-based and Transformer-based models, which struggle with long-range modeling capabilities and quadratic computational complexity, respectively. The model leverages the Visual State Space (VSS) block to capture extensive contextual information and employs an asymmetrical encoder-decoder structure. Comprehensive experiments on the ISIC17, ISIC18, and Synapse datasets demonstrate that VM-UNet performs competitively, making it the first pure SSM-based model for medical image segmentation. The paper also includes ablation studies and discusses future directions, such as designing better modules, reducing parameter count, and exploring higher resolutions and other medical imaging tasks.The paper introduces VM-UNet, a pure State Space Model (SSM)-based architecture for medical image segmentation. VM-UNet is designed to address the limitations of both CNN-based and Transformer-based models, which struggle with long-range modeling capabilities and quadratic computational complexity, respectively. The model leverages the Visual State Space (VSS) block to capture extensive contextual information and employs an asymmetrical encoder-decoder structure. Comprehensive experiments on the ISIC17, ISIC18, and Synapse datasets demonstrate that VM-UNet performs competitively, making it the first pure SSM-based model for medical image segmentation. The paper also includes ablation studies and discusses future directions, such as designing better modules, reducing parameter count, and exploring higher resolutions and other medical imaging tasks.