FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization

FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization

26 Jul 2024 | Zhao Peng Gu, Bingke Zhu, Guibo Zhu, Yingying Chen, Hao Li, Ming Tang, Jinqiao Wang
FiLo is a novel zero-shot anomaly detection (ZSAD) method that improves both anomaly detection and localization. It consists of two key components: adaptively learned Fine-Grained Description (FG-Des) and position-enhanced High-Quality Localization (HQ-Loc). FG-Des uses Large Language Models (LLMs) to generate detailed anomaly descriptions for each object category, replacing generic "normal" and "abnormal" descriptions with specific anomaly content. This enhances the accuracy and interpretability of anomaly detection. HQ-Loc employs Grounding DINO for preliminary localization, position-enhanced text prompts, and a Multi-scale Multi-shape Cross-modal Interaction (MMCI) module to accurately localize anomalies of various sizes and shapes. FG-Des generates fine-grained anomaly descriptions by leveraging LLMs and adaptively learned text templates. These descriptions are then used to create more accurate text prompts for anomaly detection. The FG-Des method improves both the accuracy of anomaly detection and the interpretability of the results by providing detailed anomaly descriptions. HQ-Loc enhances anomaly localization by using Grounding DINO for preliminary localization, position-enhanced text prompts, and the MMCI module. The MMCI module processes patch features extracted by the Image Encoder using convolutional kernels of different sizes and shapes, enabling more accurate localization of anomalies of various sizes and shapes. Experiments on datasets such as MVTec and VisA demonstrate that FiLo significantly improves the performance of ZSAD in both detection and localization. On the VisA dataset, FiLo achieves an image-level AUC of 83.9% and a pixel-level AUC of 95.9%, outperforming other ZSAD methods. The method's effectiveness is validated through extensive experiments and ablation studies, showing that both FG-Des and HQ-Loc contribute to improved performance in anomaly detection and localization. FiLo's approach is effective for zero-shot anomaly detection and localization, achieving state-of-the-art performance.FiLo is a novel zero-shot anomaly detection (ZSAD) method that improves both anomaly detection and localization. It consists of two key components: adaptively learned Fine-Grained Description (FG-Des) and position-enhanced High-Quality Localization (HQ-Loc). FG-Des uses Large Language Models (LLMs) to generate detailed anomaly descriptions for each object category, replacing generic "normal" and "abnormal" descriptions with specific anomaly content. This enhances the accuracy and interpretability of anomaly detection. HQ-Loc employs Grounding DINO for preliminary localization, position-enhanced text prompts, and a Multi-scale Multi-shape Cross-modal Interaction (MMCI) module to accurately localize anomalies of various sizes and shapes. FG-Des generates fine-grained anomaly descriptions by leveraging LLMs and adaptively learned text templates. These descriptions are then used to create more accurate text prompts for anomaly detection. The FG-Des method improves both the accuracy of anomaly detection and the interpretability of the results by providing detailed anomaly descriptions. HQ-Loc enhances anomaly localization by using Grounding DINO for preliminary localization, position-enhanced text prompts, and the MMCI module. The MMCI module processes patch features extracted by the Image Encoder using convolutional kernels of different sizes and shapes, enabling more accurate localization of anomalies of various sizes and shapes. Experiments on datasets such as MVTec and VisA demonstrate that FiLo significantly improves the performance of ZSAD in both detection and localization. On the VisA dataset, FiLo achieves an image-level AUC of 83.9% and a pixel-level AUC of 95.9%, outperforming other ZSAD methods. The method's effectiveness is validated through extensive experiments and ablation studies, showing that both FG-Des and HQ-Loc contribute to improved performance in anomaly detection and localization. FiLo's approach is effective for zero-shot anomaly detection and localization, achieving state-of-the-art performance.
Reach us at info@study.space
Understanding FiLo%3A Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization