Understanding SegFormer3D%3A an Efficient Transformer for 3D Medical Image Segmentation

**SegFormer3D: An Efficient Transformer for 3D Medical Image Segmentation** **Authors:** Shehan Perera, Pouyan Navard, Alper Yilmaz **Affiliation:** Photogrammetric Computer Vision Lab, The Ohio State University **Abstract:** The adoption of Vision Transformers (ViTs) has significantly advanced 3D medical image (MI) segmentation by enhancing global contextual understanding. However, state-of-the-art ViT-based architectures require large-scale computing resources and struggle with limited datasets. To address these challenges, we present SegFormer3D, a hierarchical Transformer that calculates attention across multiscale volumetric features. SegFormer3D uses an all-MLP decoder to aggregate local and global attention features, producing highly accurate segmentation masks. This memory-efficient Transformer preserves the performance of larger models in a compact design. SegFormer3D offers a 33× reduction in parameters and a 13× reduction in GFLOPS compared to current SOTA models. Benchmarking on datasets Synapse, BRaTs, and ACDC, SegFormer3D achieves competitive results, demonstrating its effectiveness and efficiency in 3D medical image segmentation. **Introduction:** Deep learning has transformed healthcare by enabling the analysis of complex medical data. 3D volumetric image segmentation is crucial for tasks like tumor and multi-organ localization. Traditional encoder-decoder architectures struggle with limited receptive fields, while Transformer-based techniques, such as ViTs, excel in capturing global relationships. However, ViTs often require large-scale datasets for pretraining and lack the generalization capabilities of CNNs. SegFormer3D addresses these issues by encoding feature maps at different scales and using an efficient self-attention module to reduce computational complexity. **Method:** SegFormer3D incorporates a hierarchical Transformer with overlapped patch merging to preserve local continuity and an efficient self-attention mechanism to handle long-range dependencies. The all-MLP decoder simplifies the decoding process, ensuring efficient and consistent segmentation across diverse datasets. Experiments on three benchmark datasets (Synapse, BRaTs, ACDC) validate SegFormer3D's effectiveness and efficiency. **Experimental Results:** SegFormer3D demonstrates competitive performance against SOTA models with significantly fewer parameters and computational complexity. On the BraTS dataset, it outperforms well-established solutions. On the Synapse dataset, it ranks second only to nnFormer. On the ACDC dataset, it is within 1% of SOTA performance. These results highlight the potential of lightweight and efficient Transformers in 3D medical image segmentation. **Conclusion:** SegFormer3D is a lightweight architecture that reduces parameter count and computational complexity while maintaining high performance. This approach broadens accessibility and promotes practical applications in medical imaging, especially in scenarios with limited computational resources.**SegFormer3D: An Efficient Transformer for 3D Medical Image Segmentation** **Authors:** Shehan Perera, Pouyan Navard, Alper Yilmaz **Affiliation:** Photogrammetric Computer Vision Lab, The Ohio State University **Abstract:** The adoption of Vision Transformers (ViTs) has significantly advanced 3D medical image (MI) segmentation by enhancing global contextual understanding. However, state-of-the-art ViT-based architectures require large-scale computing resources and struggle with limited datasets. To address these challenges, we present SegFormer3D, a hierarchical Transformer that calculates attention across multiscale volumetric features. SegFormer3D uses an all-MLP decoder to aggregate local and global attention features, producing highly accurate segmentation masks. This memory-efficient Transformer preserves the performance of larger models in a compact design. SegFormer3D offers a 33× reduction in parameters and a 13× reduction in GFLOPS compared to current SOTA models. Benchmarking on datasets Synapse, BRaTs, and ACDC, SegFormer3D achieves competitive results, demonstrating its effectiveness and efficiency in 3D medical image segmentation. **Introduction:** Deep learning has transformed healthcare by enabling the analysis of complex medical data. 3D volumetric image segmentation is crucial for tasks like tumor and multi-organ localization. Traditional encoder-decoder architectures struggle with limited receptive fields, while Transformer-based techniques, such as ViTs, excel in capturing global relationships. However, ViTs often require large-scale datasets for pretraining and lack the generalization capabilities of CNNs. SegFormer3D addresses these issues by encoding feature maps at different scales and using an efficient self-attention module to reduce computational complexity. **Method:** SegFormer3D incorporates a hierarchical Transformer with overlapped patch merging to preserve local continuity and an efficient self-attention mechanism to handle long-range dependencies. The all-MLP decoder simplifies the decoding process, ensuring efficient and consistent segmentation across diverse datasets. Experiments on three benchmark datasets (Synapse, BRaTs, ACDC) validate SegFormer3D's effectiveness and efficiency. **Experimental Results:** SegFormer3D demonstrates competitive performance against SOTA models with significantly fewer parameters and computational complexity. On the BraTS dataset, it outperforms well-established solutions. On the Synapse dataset, it ranks second only to nnFormer. On the ACDC dataset, it is within 1% of SOTA performance. These results highlight the potential of lightweight and efficient Transformers in 3D medical image segmentation. **Conclusion:** SegFormer3D is a lightweight architecture that reduces parameter count and computational complexity while maintaining high performance. This approach broadens accessibility and promotes practical applications in medical imaging, especially in scenarios with limited computational resources.

SegFormer3D: an Efficient Transformer for 3D Medical Image Segmentation

23 Apr 2024 | Shehan Perera, Pouyan Navard, Alper Yilmaz

SegFormer3D: an Efficient Transformer for 3D Medical Image Segmentation

23 Apr 2024 | Shehan Perera*, Pouyan Navard*, Alper Yilmaz

23 Apr 2024 | Shehan Perera, Pouyan Navard, Alper Yilmaz