Understanding Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images

This paper addresses the challenge of adapting large-scale visual-language models for generalizable anomaly detection in medical images. The authors propose a novel multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection. The approach integrates multiple residual adapters into the pre-trained visual encoder, enabling stepwise enhancement of visual features across different levels. This multi-level adaptation is guided by multi-level, pixel-wise visual-language feature alignment loss functions, which recalibrate the model's focus from object semantics in natural imagery to anomaly identification in medical images. The adapted features exhibit improved generalization across various medical data types, even in zero-shot scenarios where the model encounters unseen medical modalities and anatomical regions during training. Experiments on medical anomaly detection benchmarks demonstrate that the proposed method significantly outperforms state-of-the-art models, with average AUC improvements of 6.24% and 7.33% for anomaly classification, and 2.03% and 2.37% for anomaly segmentation under zero-shot and few-shot settings, respectively. The main contributions include a novel multi-level feature adaptation framework and extensive experiments on challenging medical AD benchmarks, showcasing the method's exceptional generalizability.This paper addresses the challenge of adapting large-scale visual-language models for generalizable anomaly detection in medical images. The authors propose a novel multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection. The approach integrates multiple residual adapters into the pre-trained visual encoder, enabling stepwise enhancement of visual features across different levels. This multi-level adaptation is guided by multi-level, pixel-wise visual-language feature alignment loss functions, which recalibrate the model's focus from object semantics in natural imagery to anomaly identification in medical images. The adapted features exhibit improved generalization across various medical data types, even in zero-shot scenarios where the model encounters unseen medical modalities and anatomical regions during training. Experiments on medical anomaly detection benchmarks demonstrate that the proposed method significantly outperforms state-of-the-art models, with average AUC improvements of 6.24% and 7.33% for anomaly classification, and 2.03% and 2.37% for anomaly segmentation under zero-shot and few-shot settings, respectively. The main contributions include a novel multi-level feature adaptation framework and extensive experiments on challenging medical AD benchmarks, showcasing the method's exceptional generalizability.

Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images

19 Mar 2024 | Chaoqin Huang, Aofan Jiang, Jinghao Feng, Ya Zhang, Xinchao Wang, Yanfeng Wang