[slides and audio] FiLo%3A Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization

The paper introduces FiLo, a novel zero-shot anomaly detection (ZSAD) method that addresses the limitations of existing methods by improving both anomaly detection and localization accuracy. FiLo consists of two main components: Fine-Grained Description (FG-Des) and High-Quality Localization (HQ-Loc). FG-Des leverages Large Language Models (LLMs) to generate fine-grained anomaly descriptions for each category, enhancing the accuracy and interpretability of anomaly detection. HQ-Loc employs Grounding DINO for preliminary localization, position-enhanced text prompts, and a Multi-scale Multi-shape Cross-modal Interaction (MMCI) module to accurately localize anomalies of different sizes and shapes. Experimental results on datasets like MVTec and VisA demonstrate that FiLo significantly outperforms existing ZSAD methods, achieving state-of-the-art performance with an image-level AUC of 83.9% and a pixel-level AUC of 95.9% on the VisA dataset. The contributions of FiLo include the introduction of FG-Des and HQ-Loc, which improve the accuracy and interpretability of anomaly detection and localization, respectively.The paper introduces FiLo, a novel zero-shot anomaly detection (ZSAD) method that addresses the limitations of existing methods by improving both anomaly detection and localization accuracy. FiLo consists of two main components: Fine-Grained Description (FG-Des) and High-Quality Localization (HQ-Loc). FG-Des leverages Large Language Models (LLMs) to generate fine-grained anomaly descriptions for each category, enhancing the accuracy and interpretability of anomaly detection. HQ-Loc employs Grounding DINO for preliminary localization, position-enhanced text prompts, and a Multi-scale Multi-shape Cross-modal Interaction (MMCI) module to accurately localize anomalies of different sizes and shapes. Experimental results on datasets like MVTec and VisA demonstrate that FiLo significantly outperforms existing ZSAD methods, achieving state-of-the-art performance with an image-level AUC of 83.9% and a pixel-level AUC of 95.9% on the VisA dataset. The contributions of FiLo include the introduction of FG-Des and HQ-Loc, which improve the accuracy and interpretability of anomaly detection and localization, respectively.

FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization

26 Jul 2024 | Zhaopeng Gu1,2 Bingke Zhu1,3 Guibo Zhu1,2 Yingying Chen1,3 Hao Li4* Ming Tang1,2 Jinqiao Wang1,2,3