Unleashing the Potential of SAM for Medical Adaptation via Hierarchical Decoding

Unleashing the Potential of SAM for Medical Adaptation via Hierarchical Decoding

27 Mar 2024 | Zhiheng Cheng, Qingyue Wei, Hongru Zhu, Yan Wang, Liangqiong Qu, Wei Shao, Yuyin Zhou
The paper introduces H-SAM, a prompt-free adaptation of the Segment Anything Model (SAM) designed for efficient fine-tuning of medical images. H-SAM employs a two-stage hierarchical decoding procedure to enhance the model's performance in medical image segmentation. In the first stage, SAM's original decoder generates a prior probabilistic mask, which guides a more intricate decoding process in the second stage. Key contributions include a class-balanced, mask-guided self-attention mechanism to address unbalanced label distribution and a learnable mask cross-attention mechanism to modulate spatial dynamics among different image regions. The hierarchical pixel decoder complements the hierarchical Transformer decoder, improving the model's precision and ability to capture fine-grained details. H-SAM demonstrates significant improvements over existing prompt-free SAM variants, achieving a 4.78% improvement in average Dice score for multi-organ segmentation using only 10% of 2D slices. Notably, H-SAM outperforms state-of-the-art semi-supervised models without using any unlabeled data, highlighting its potential in medical imaging applications. The code for H-SAM is available at <https://github.com/Ccccccczb404/H-SAM>.The paper introduces H-SAM, a prompt-free adaptation of the Segment Anything Model (SAM) designed for efficient fine-tuning of medical images. H-SAM employs a two-stage hierarchical decoding procedure to enhance the model's performance in medical image segmentation. In the first stage, SAM's original decoder generates a prior probabilistic mask, which guides a more intricate decoding process in the second stage. Key contributions include a class-balanced, mask-guided self-attention mechanism to address unbalanced label distribution and a learnable mask cross-attention mechanism to modulate spatial dynamics among different image regions. The hierarchical pixel decoder complements the hierarchical Transformer decoder, improving the model's precision and ability to capture fine-grained details. H-SAM demonstrates significant improvements over existing prompt-free SAM variants, achieving a 4.78% improvement in average Dice score for multi-organ segmentation using only 10% of 2D slices. Notably, H-SAM outperforms state-of-the-art semi-supervised models without using any unlabeled data, highlighting its potential in medical imaging applications. The code for H-SAM is available at <https://github.com/Ccccccczb404/H-SAM>.
Reach us at info@study.space