5 Jul 2024 | Weiyi Xie, Nathalie Willems, Shubham Patil, Yang Li, and Mayank Kumar
The paper proposes a few-shot fine-tuning strategy for adapting the Segment Anything (SAM) model to anatomical segmentation tasks in medical images. The key innovation is reformulating the mask decoder within SAM to use few-shot embeddings derived from a limited set of labeled images as prompts for querying anatomical objects. This approach reduces the need for time-consuming user interactions, such as marking points and bounding boxes, and allows users to manually segment a few 2D slices offline. The method prioritizes efficiency by training only the mask decoder while keeping the image encoder frozen. The evaluation on four datasets covering six anatomical segmentation tasks shows that the proposed method achieves superior performance compared to SAM using point prompts (with a 50% improvement in IoU) and performs on par with fully supervised methods while significantly reducing the requirement for labeled data. The method is not limited to medical images but can be applied to any 2D/3D segmentation task.The paper proposes a few-shot fine-tuning strategy for adapting the Segment Anything (SAM) model to anatomical segmentation tasks in medical images. The key innovation is reformulating the mask decoder within SAM to use few-shot embeddings derived from a limited set of labeled images as prompts for querying anatomical objects. This approach reduces the need for time-consuming user interactions, such as marking points and bounding boxes, and allows users to manually segment a few 2D slices offline. The method prioritizes efficiency by training only the mask decoder while keeping the image encoder frozen. The evaluation on four datasets covering six anatomical segmentation tasks shows that the proposed method achieves superior performance compared to SAM using point prompts (with a 50% improvement in IoU) and performs on par with fully supervised methods while significantly reducing the requirement for labeled data. The method is not limited to medical images but can be applied to any 2D/3D segmentation task.