[slides and audio] Do LLMs Understand Visual Anomalies%3F Uncovering LLM's Capabilities in Zero-shot Anomaly Detection

The paper "Do LLMs Understand Visual Anomalies? Uncovering LLM’s Capabilities in Zero-shot Anomaly Detection" by Jiaqi Zhu, Shaofeng Cai, Fang Deng, and Junran Wu addresses the challenges of zero-shot visual anomaly detection (VAD) using large vision-language models (LVLMs). The authors propose ALFA, a training-free approach that leverages the zero-shot capabilities of large language models (LLMs) to generate informative and adaptive anomaly prompts. ALFA includes a run-time prompt adaptation strategy and a contextual scoring mechanism to mitigate cross-semantic ambiguity, and a fine-grained aligner to enable precise pixel-level anomaly localization. Extensive experiments on the MVTec and VisA datasets demonstrate that ALFA significantly outperforms state-of-the-art zero-shot VAD methods, achieving 12.1% and 8.9% improvements in PRO scores for pixel-level and image-level anomaly detection, respectively. The paper also explores the interpretability of ALFA's decision-making process and its performance in few-shot settings, further validating its effectiveness and generalization capabilities.The paper "Do LLMs Understand Visual Anomalies? Uncovering LLM’s Capabilities in Zero-shot Anomaly Detection" by Jiaqi Zhu, Shaofeng Cai, Fang Deng, and Junran Wu addresses the challenges of zero-shot visual anomaly detection (VAD) using large vision-language models (LVLMs). The authors propose ALFA, a training-free approach that leverages the zero-shot capabilities of large language models (LLMs) to generate informative and adaptive anomaly prompts. ALFA includes a run-time prompt adaptation strategy and a contextual scoring mechanism to mitigate cross-semantic ambiguity, and a fine-grained aligner to enable precise pixel-level anomaly localization. Extensive experiments on the MVTec and VisA datasets demonstrate that ALFA significantly outperforms state-of-the-art zero-shot VAD methods, achieving 12.1% and 8.9% improvements in PRO scores for pixel-level and image-level anomaly detection, respectively. The paper also explores the interpretability of ALFA's decision-making process and its performance in few-shot settings, further validating its effectiveness and generalization capabilities.

Do LLMs Understand Visual Anomalies? Uncovering LLM’s Capabilities in Zero-shot Anomaly Detection

15 Apr 2024 | Jiaqi Zhu1, Shaofeng Cai2, Fang Deng1, Junran Wu2