ClipSAM: CLIP and SAM Collaboration for Zero-Shot Anomaly Segmentation

ClipSAM: CLIP and SAM Collaboration for Zero-Shot Anomaly Segmentation

29 Jan 2024 | Shengze Li1, Jianjian Cao1, Peng Ye1, Yuhan Ding1, Chongjun Tu1 and Tao Chen1*
The paper introduces ClipSAM, a novel framework that combines CLIP and SAM to enhance zero-shot anomaly segmentation (ZSAS). CLIP excels in semantic understanding and global feature alignment, while SAM is strong in fine-grained segmentation and mask refinement. ClipSAM leverages CLIP for initial localization and rough segmentation, then uses SAM to refine these results. The framework includes two main components: the Unified Multi-scale Cross-modal Interaction (UMCI) module and the Multi-level Mask Refinement (MMR) module. UMC1 integrates language and visual features at different scales and directions to improve CLIP's localization accuracy. MMR extracts point and box prompts from CLIP's localization results to guide SAM in generating precise masks. Extensive experiments on the MVtec-AD and VisA datasets demonstrate that ClipSAM outperforms existing methods, achieving state-of-the-art performance in various metrics. The paper also includes ablation studies to validate the effectiveness of each component and hyperparameters.The paper introduces ClipSAM, a novel framework that combines CLIP and SAM to enhance zero-shot anomaly segmentation (ZSAS). CLIP excels in semantic understanding and global feature alignment, while SAM is strong in fine-grained segmentation and mask refinement. ClipSAM leverages CLIP for initial localization and rough segmentation, then uses SAM to refine these results. The framework includes two main components: the Unified Multi-scale Cross-modal Interaction (UMCI) module and the Multi-level Mask Refinement (MMR) module. UMC1 integrates language and visual features at different scales and directions to improve CLIP's localization accuracy. MMR extracts point and box prompts from CLIP's localization results to guide SAM in generating precise masks. Extensive experiments on the MVtec-AD and VisA datasets demonstrate that ClipSAM outperforms existing methods, achieving state-of-the-art performance in various metrics. The paper also includes ablation studies to validate the effectiveness of each component and hyperparameters.
Reach us at info@study.space
[slides] ClipSAM%3A CLIP and SAM Collaboration for Zero-Shot Anomaly Segmentation | StudySpace