BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model

BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model

20 Mar 2024 | Yiran Song, Qianyu Zhou, Xiangtai Li, Deng-Ping Fan, Xuequan Lu, Lizhuang Ma
BA-SAM is a scalable bias-mode attention mask designed to enhance the Segment Anything Model (SAM)’s adaptability to varying image resolutions without requiring structural modifications. The main challenge addressed is the performance degradation of SAM when dealing with images of different resolutions, as previous methods often resize images or adjust patch sizes, which can hinder the model’s ability to retain its rich prior knowledge. BA-SAM reformulates this issue as a length extrapolation problem, where the token sequence length varies while maintaining a consistent patch size. The proposed method introduces two key components: a new scaling factor to ensure consistent magnitude in the attention layer’s dot product values and a bias-mode attention mask that prioritizes neighboring information, reducing the impact of untrained distant information. The new scaling factor helps maintain consistent attention values despite changes in token sequence length, while the bias-mode attention mask penalizes attention scores between distant query-key pairs, mitigating the negative effects of untrained information. BA-SAM has been evaluated on diverse datasets including DIS5K, DUTS, ISIC, COD10K, and COCO, demonstrating its effectiveness in both zero-shot and fine-tuning scenarios. It significantly mitigates performance degradation in the zero-shot setting and achieves state-of-the-art performance with minimal fine-tuning. Additionally, a generalized model and benchmark are proposed, showcasing BA-SAM’s generalizability across all four datasets. The method is lightweight and can be seamlessly integrated into SAM-based models with minimal computational overhead. The results show that BA-SAM outperforms existing methods in terms of performance and generalization, making it a practical solution for enhancing SAM’s adaptability to varying image resolutions.BA-SAM is a scalable bias-mode attention mask designed to enhance the Segment Anything Model (SAM)’s adaptability to varying image resolutions without requiring structural modifications. The main challenge addressed is the performance degradation of SAM when dealing with images of different resolutions, as previous methods often resize images or adjust patch sizes, which can hinder the model’s ability to retain its rich prior knowledge. BA-SAM reformulates this issue as a length extrapolation problem, where the token sequence length varies while maintaining a consistent patch size. The proposed method introduces two key components: a new scaling factor to ensure consistent magnitude in the attention layer’s dot product values and a bias-mode attention mask that prioritizes neighboring information, reducing the impact of untrained distant information. The new scaling factor helps maintain consistent attention values despite changes in token sequence length, while the bias-mode attention mask penalizes attention scores between distant query-key pairs, mitigating the negative effects of untrained information. BA-SAM has been evaluated on diverse datasets including DIS5K, DUTS, ISIC, COD10K, and COCO, demonstrating its effectiveness in both zero-shot and fine-tuning scenarios. It significantly mitigates performance degradation in the zero-shot setting and achieves state-of-the-art performance with minimal fine-tuning. Additionally, a generalized model and benchmark are proposed, showcasing BA-SAM’s generalizability across all four datasets. The method is lightweight and can be seamlessly integrated into SAM-based models with minimal computational overhead. The results show that BA-SAM outperforms existing methods in terms of performance and generalization, making it a practical solution for enhancing SAM’s adaptability to varying image resolutions.
Reach us at info@study.space
[slides and audio] BA-SAM%3A Scalable Bias-Mode Attention Mask for Segment Anything Model