Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation

Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation

12 May 2021 | Hu Cao1†, Yueyue Wang2†, Joy Chen1, Dongsheng Jiang3*, Xiaopeng Zhang3*, Qi Tian3*, and Manning Wang2
Swin-Unet is a pure Transformer-based U-shaped architecture for medical image segmentation. It uses a hierarchical Swin Transformer with shifted windows in the encoder to extract context features and a symmetric Swin Transformer-based decoder with patch expanding layer for up-sampling. The method achieves superior performance on multi-organ and cardiac segmentation tasks compared to CNN-based and hybrid methods. The encoder splits input images into non-overlapping patches, which are then processed by the Transformer-based encoder to learn deep features. The decoder upsamples these features using a patch expanding layer and fuses them with multi-scale features from the encoder via skip connections to restore spatial resolution. Extensive experiments show that Swin-Unet outperforms existing methods in segmentation accuracy and generalization. The method is evaluated on the Synapse and ACDC datasets, demonstrating its effectiveness in medical image segmentation. The model is built using Swin Transformer blocks and is designed to be efficient and robust. The results indicate that pure Transformer-based approaches can better learn global and long-range semantic interactions, leading to improved segmentation performance. The method is also tested on 3D medical images, showing its potential for broader applications.Swin-Unet is a pure Transformer-based U-shaped architecture for medical image segmentation. It uses a hierarchical Swin Transformer with shifted windows in the encoder to extract context features and a symmetric Swin Transformer-based decoder with patch expanding layer for up-sampling. The method achieves superior performance on multi-organ and cardiac segmentation tasks compared to CNN-based and hybrid methods. The encoder splits input images into non-overlapping patches, which are then processed by the Transformer-based encoder to learn deep features. The decoder upsamples these features using a patch expanding layer and fuses them with multi-scale features from the encoder via skip connections to restore spatial resolution. Extensive experiments show that Swin-Unet outperforms existing methods in segmentation accuracy and generalization. The method is evaluated on the Synapse and ACDC datasets, demonstrating its effectiveness in medical image segmentation. The model is built using Swin Transformer blocks and is designed to be efficient and robust. The results indicate that pure Transformer-based approaches can better learn global and long-range semantic interactions, leading to improved segmentation performance. The method is also tested on 3D medical images, showing its potential for broader applications.
Reach us at info@study.space
[slides] Swin-Unet%3A Unet-like Pure Transformer for Medical Image Segmentation | StudySpace