[slides] AlignSAM%3A Aligning Segment Anything Model to Open Context via Reinforcement Learning

**AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning** **Authors:** Duojun Huang, Xinyu Xiong, Jie Ma, Jichang Li, Zequn Jie, Lin Ma, Guanbin Li **Institution:** School of Computer Science and Engineering, Sun Yat-sen University; GuangDong Province Key Laboratory of Information Security Technology; The University of Hong Kong; Meituan **Abstract:** The Segment Anything Model (SAM) has demonstrated impressive generalization capabilities in open-world scenarios with the guidance of prompts. However, it relies heavily on user-provided prompts and is class-agnostic. This paper proposes AlignSAM, a novel framework designed to automatically prompt SAM for aligning it to diverse tasks through reinforcement learning. AlignSAM enables the generality of SAM across various downstream tasks while keeping its parameters frozen. The framework includes a reinforcement learning policy network to provide informative prompts and a semantic recalibration module to enhance the model's proficiency in handling tasks with explicit and implicit semantics. Experiments on various challenging segmentation tasks demonstrate the superiority of AlignSAM over state-of-the-art approaches. **Contributions:** - A general approach, AlignSAM, is proposed to optimize the automatic prompting policy for efficiently adapting foundation models to downstream tasks. - A semantic recalibration module is introduced to provide precise prompting labels for tasks with explicit and implicit semantics. - Experiments on various challenging segmentation tasks show the superiority of AlignSAM over state-of-the-art approaches. **Related Work:** - Image segmentation has evolved with richly annotated datasets and advanced feature extractors. - Vision foundation models have made significant advancements in various computer vision tasks. - Parameter-efficient fine-tuning methods aim to optimize extra learnable parameters on a small scale. - Reinforcement learning has been applied to various vision scenarios, including image classification and semantic segmentation. **Methodology:** - **Revisit of SAM:** SAM is a vision foundation model capable of segmenting prompted targets. - **Task Overview:** AlignSAM adapts SAM to diversified scenarios while keeping its parameters frozen. - **Target-aware Reinforcement Learning:** An agent is trained to recommend optimal prompting positions using a reinforcement learning framework. - **Semantic Recalibration Module:** This module provides precise labels for selected prompts, enhancing the model's proficiency in handling tasks with explicit and implicit semantics. **Experiments:** - **Experimental Setups:** Datasets and implementation details are provided. - **Results:** AlignSAM outperforms other state-of-the-art methods on various benchmark datasets. - **Ablation Study:** The effectiveness of each component and hyperparameters is evaluated. **Conclusion:** AlignSAM effectively aligns SAM to open contexts through reinforcement learning, enabling its adaptability across diverse downstream tasks while maintaining its generality.**AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning** **Authors:** Duojun Huang, Xinyu Xiong, Jie Ma, Jichang Li, Zequn Jie, Lin Ma, Guanbin Li **Institution:** School of Computer Science and Engineering, Sun Yat-sen University; GuangDong Province Key Laboratory of Information Security Technology; The University of Hong Kong; Meituan **Abstract:** The Segment Anything Model (SAM) has demonstrated impressive generalization capabilities in open-world scenarios with the guidance of prompts. However, it relies heavily on user-provided prompts and is class-agnostic. This paper proposes AlignSAM, a novel framework designed to automatically prompt SAM for aligning it to diverse tasks through reinforcement learning. AlignSAM enables the generality of SAM across various downstream tasks while keeping its parameters frozen. The framework includes a reinforcement learning policy network to provide informative prompts and a semantic recalibration module to enhance the model's proficiency in handling tasks with explicit and implicit semantics. Experiments on various challenging segmentation tasks demonstrate the superiority of AlignSAM over state-of-the-art approaches. **Contributions:** - A general approach, AlignSAM, is proposed to optimize the automatic prompting policy for efficiently adapting foundation models to downstream tasks. - A semantic recalibration module is introduced to provide precise prompting labels for tasks with explicit and implicit semantics. - Experiments on various challenging segmentation tasks show the superiority of AlignSAM over state-of-the-art approaches. **Related Work:** - Image segmentation has evolved with richly annotated datasets and advanced feature extractors. - Vision foundation models have made significant advancements in various computer vision tasks. - Parameter-efficient fine-tuning methods aim to optimize extra learnable parameters on a small scale. - Reinforcement learning has been applied to various vision scenarios, including image classification and semantic segmentation. **Methodology:** - **Revisit of SAM:** SAM is a vision foundation model capable of segmenting prompted targets. - **Task Overview:** AlignSAM adapts SAM to diversified scenarios while keeping its parameters frozen. - **Target-aware Reinforcement Learning:** An agent is trained to recommend optimal prompting positions using a reinforcement learning framework. - **Semantic Recalibration Module:** This module provides precise labels for selected prompts, enhancing the model's proficiency in handling tasks with explicit and implicit semantics. **Experiments:** - **Experimental Setups:** Datasets and implementation details are provided. - **Results:** AlignSAM outperforms other state-of-the-art methods on various benchmark datasets. - **Ablation Study:** The effectiveness of each component and hyperparameters is evaluated. **Conclusion:** AlignSAM effectively aligns SAM to open contexts through reinforcement learning, enabling its adaptability across diverse downstream tasks while maintaining its generality.

AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning

1 Jun 2024 | Duojun Huang, Xinyu Xiong, Jie Ma, Jichang Li, Zequn Jie, Lin Ma, Guanbin Li