29 Jan 2024 | Shengze Li, Jianjian Cao, Peng Ye, Yuhan Ding, Chongjun Tu, Tao Chen
ClipSAM: CLIP and SAM Collaboration for Zero-Shot Anomaly Segmentation
This paper proposes a novel framework called ClipSAM for Zero-Shot Anomaly Segmentation (ZSAS), which combines the strengths of CLIP and SAM. CLIP is used for anomaly localization and rough segmentation, while SAM is used to refine the segmentation results. The key components of ClipSAM are the Unified Multi-scale Cross-modal Interaction (UMCI) module and the Multi-level Mask Refinement (MMR) module. The UMCI module enables cross-modal interaction between text and visual features at multiple scales, while the MMR module uses the localization information from CLIP to guide SAM in generating accurate masks. The framework achieves state-of-the-art performance on the MVTec-AD and VisA datasets, outperforming existing methods in multiple metrics. The proposed framework demonstrates the effectiveness of combining CLIP and SAM for ZSAS, and provides a new direction for improving ZSAS by leveraging the characteristics of different foundation models.ClipSAM: CLIP and SAM Collaboration for Zero-Shot Anomaly Segmentation
This paper proposes a novel framework called ClipSAM for Zero-Shot Anomaly Segmentation (ZSAS), which combines the strengths of CLIP and SAM. CLIP is used for anomaly localization and rough segmentation, while SAM is used to refine the segmentation results. The key components of ClipSAM are the Unified Multi-scale Cross-modal Interaction (UMCI) module and the Multi-level Mask Refinement (MMR) module. The UMCI module enables cross-modal interaction between text and visual features at multiple scales, while the MMR module uses the localization information from CLIP to guide SAM in generating accurate masks. The framework achieves state-of-the-art performance on the MVTec-AD and VisA datasets, outperforming existing methods in multiple metrics. The proposed framework demonstrates the effectiveness of combining CLIP and SAM for ZSAS, and provides a new direction for improving ZSAS by leveraging the characteristics of different foundation models.