PA-SAM: Prompt Adapter SAM for High-Quality Image Segmentation

PA-SAM: Prompt Adapter SAM for High-Quality Image Segmentation

23 Jan 2024 | Zhaozhi Xie, Bochen Guan, Weihao Jiang, Muyang Yi, Yue Ding, Hongtao Lu, Lei Zhang
PA-SAM is a novel prompt-driven adapter for the Segment Anything Model (SAM), designed to enhance the quality of image segmentation. SAM has shown strong performance in various segmentation tasks but struggles with mask prediction quality in many scenarios, especially in real-world contexts. PA-SAM introduces a prompt adapter to improve SAM's segmentation quality by extracting detailed information from images and optimizing the mask decoder feature at both sparse and dense prompt levels. Experimental results show that PA-SAM outperforms other SAM-based methods in high-quality, zero-shot, and open-set segmentation. PA-SAM freezes the SAM component and only trains the prompt adapter, preserving SAM's object localization capability while generating high-quality segmentation maps. PA-SAM achieves leading performance on the high-quality dataset HQSeg-44K, with improvements in mIoU and BmIoU over the previous state-of-the-art. It also demonstrates promising results on zero-shot and open-set segmentation datasets. PA-SAM's method includes adaptive detail enhancement and hard point mining to capture detailed information from images. The prompt adapter is designed to adaptively capture relevant detail information based on the original prompts. PA-SAM's architecture combines image features with dense prompts and sends them, along with sparse prompts, to the mask decoder. The output prompt features are reintegrated into PA-SAM in a residual manner to optimize the feature representation of the mask decoder. PA-SAM's method improves the quality of segmentation by utilizing both detailed and less-detailed information. PA-SAM's performance is validated on various datasets, including HQSeg-44K, COCO, and SegInW, showing significant improvements in segmentation quality. PA-SAM's results demonstrate that optimizing the detailed representation of intermediate features in the mask decoder is more beneficial for generating high-quality segmentation maps than training with final features. PA-SAM's method is effective in zero-shot and open-set segmentation tasks, showing better resistance to detection errors compared to other methods. PA-SAM's results indicate that it can effectively capture detailed information from images and improve the quality of segmentation.PA-SAM is a novel prompt-driven adapter for the Segment Anything Model (SAM), designed to enhance the quality of image segmentation. SAM has shown strong performance in various segmentation tasks but struggles with mask prediction quality in many scenarios, especially in real-world contexts. PA-SAM introduces a prompt adapter to improve SAM's segmentation quality by extracting detailed information from images and optimizing the mask decoder feature at both sparse and dense prompt levels. Experimental results show that PA-SAM outperforms other SAM-based methods in high-quality, zero-shot, and open-set segmentation. PA-SAM freezes the SAM component and only trains the prompt adapter, preserving SAM's object localization capability while generating high-quality segmentation maps. PA-SAM achieves leading performance on the high-quality dataset HQSeg-44K, with improvements in mIoU and BmIoU over the previous state-of-the-art. It also demonstrates promising results on zero-shot and open-set segmentation datasets. PA-SAM's method includes adaptive detail enhancement and hard point mining to capture detailed information from images. The prompt adapter is designed to adaptively capture relevant detail information based on the original prompts. PA-SAM's architecture combines image features with dense prompts and sends them, along with sparse prompts, to the mask decoder. The output prompt features are reintegrated into PA-SAM in a residual manner to optimize the feature representation of the mask decoder. PA-SAM's method improves the quality of segmentation by utilizing both detailed and less-detailed information. PA-SAM's performance is validated on various datasets, including HQSeg-44K, COCO, and SegInW, showing significant improvements in segmentation quality. PA-SAM's results demonstrate that optimizing the detailed representation of intermediate features in the mask decoder is more beneficial for generating high-quality segmentation maps than training with final features. PA-SAM's method is effective in zero-shot and open-set segmentation tasks, showing better resistance to detection errors compared to other methods. PA-SAM's results indicate that it can effectively capture detailed information from images and improve the quality of segmentation.
Reach us at info@study.space