[slides] SAM Fewshot Finetuning for Anatomical Segmentation in Medical Images

This paper proposes a few-shot fine-tuning strategy for adapting the Segment Anything Model (SAM) to anatomical segmentation tasks in medical images. The method reformulates the mask decoder within SAM to use few-shot embeddings derived from a limited set of labeled images as prompts for querying anatomical objects. This approach reduces the need for time-consuming user interactions for labeling volumetric images, allowing users to manually segment a few 2D slices offline, with the embeddings of these annotated regions serving as effective prompts for online segmentation tasks. The method prioritizes the efficiency of the fine-tuning process by training only the mask decoder while keeping the image encoder frozen. It is not limited to volumetric medical images but can be applied to any 2D/3D segmentation task. The method was evaluated on four datasets covering six anatomical segmentation tasks across two modalities. It was compared with different prompting options within SAM and the fully-supervised nnU-Net. Results showed that the proposed method outperformed SAM using only point prompts (50% improvement in IoU) and performed on-par with fully supervised methods while reducing the requirement of labeled data by at least an order of magnitude. The paper discusses the challenges of using SAM for segmenting volumetric medical images, including the difficulty of providing accurate prompts and the ambiguity of predictions when anatomical structures are closely layered. It also highlights the limitations of relying solely on IoU and stability scores for selecting segmentation results. The proposed few-shot fine-tuning method uses SAM's image encoder to extract target embeddings from a set of few-shot images labeled for the specific segmentation task. These embeddings are used as prompts for the mask decoder, which is modified to accept few-shot target embeddings instead of user-defined prompts. The method is efficient and requires minimal labeled images (5-20) for fine-tuning, significantly reducing the training effort required. Experiments on various anatomical structures showed that the proposed method achieves comparable performance to SAM with accurate bounding box prompts and outperforms SAM with point prompts. It also performs better than the fully supervised nnU-Net approach with significantly fewer labeled images. The method is efficient and practical for adapting SAM to medical image segmentation, providing a generic framework for token-query-based object detection and classification tasks beyond medical imaging.This paper proposes a few-shot fine-tuning strategy for adapting the Segment Anything Model (SAM) to anatomical segmentation tasks in medical images. The method reformulates the mask decoder within SAM to use few-shot embeddings derived from a limited set of labeled images as prompts for querying anatomical objects. This approach reduces the need for time-consuming user interactions for labeling volumetric images, allowing users to manually segment a few 2D slices offline, with the embeddings of these annotated regions serving as effective prompts for online segmentation tasks. The method prioritizes the efficiency of the fine-tuning process by training only the mask decoder while keeping the image encoder frozen. It is not limited to volumetric medical images but can be applied to any 2D/3D segmentation task. The method was evaluated on four datasets covering six anatomical segmentation tasks across two modalities. It was compared with different prompting options within SAM and the fully-supervised nnU-Net. Results showed that the proposed method outperformed SAM using only point prompts (50% improvement in IoU) and performed on-par with fully supervised methods while reducing the requirement of labeled data by at least an order of magnitude. The paper discusses the challenges of using SAM for segmenting volumetric medical images, including the difficulty of providing accurate prompts and the ambiguity of predictions when anatomical structures are closely layered. It also highlights the limitations of relying solely on IoU and stability scores for selecting segmentation results. The proposed few-shot fine-tuning method uses SAM's image encoder to extract target embeddings from a set of few-shot images labeled for the specific segmentation task. These embeddings are used as prompts for the mask decoder, which is modified to accept few-shot target embeddings instead of user-defined prompts. The method is efficient and requires minimal labeled images (5-20) for fine-tuning, significantly reducing the training effort required. Experiments on various anatomical structures showed that the proposed method achieves comparable performance to SAM with accurate bounding box prompts and outperforms SAM with point prompts. It also performs better than the fully supervised nnU-Net approach with significantly fewer labeled images. The method is efficient and practical for adapting SAM to medical image segmentation, providing a generic framework for token-query-based object detection and classification tasks beyond medical imaging.

SAM Fewshot Finetuning for Anatomical Segmentation in Medical Images

5 Jul 2024 | Weiyi Xie, Nathalie Willems, Shubham Patil, Yang Li, and Mayank Kumar