Rethinking Reverse Distillation for Multi-Modal Anomaly Detection

Rethinking Reverse Distillation for Multi-Modal Anomaly Detection

2024 | Zhihao Gu, Jiangning Zhang, Liang Liu, Xu Chen, Jinlong Peng, Zhenye Gan, Guannan Jiang, Annan Shu, Yabiao Wang, Lizhuang Ma
This paper introduces a novel Multi-Modal Reverse Distillation (MMRD) paradigm for multi-modal anomaly detection. The MMRD framework consists of a frozen multi-modal teacher encoder that generates distillation targets and a learnable student decoder that restores multi-modal representations. The teacher extracts complementary visual features from different modalities using a siamese architecture and fuses them parameter-free to form distillation targets. The student learns modality-related priors from normal training data and interacts with them to form multi-modal representations for target reconstruction. Extensive experiments show that MMRD outperforms recent state-of-the-art methods on both anomaly detection and localization on the MVTec-3D AD and Eyecandies benchmarks. The MMRD approach is flexible, handling images, depth, and surface normals, and is generalizable to another distillation paradigm, forward distillation. The main contributions include the development of a novel reverse distillation paradigm, the design of a frozen multi-modal teacher encoder, the design of a learnable multi-modal student decoder, and the achievement of state-of-the-art results on two multi-modal anomaly detection benchmarks. The method is evaluated on two multi-modal benchmarks, MVTec 3D-AD and Eyecandies, and shows superior performance in anomaly detection and localization. The MMRD approach is also compared with several state-of-the-art multi-modal detectors, including AST, M3DM, PatchCore, and Eyecandy, and outperforms them in four out of five AD metrics. The method is efficient, with less training time and faster inference speed compared to other methods. The MMRD approach is also analyzed through ablation studies, showing that the integration of auxiliary modalities and multi-modal interaction significantly improves performance. The method is visualized and analyzed to show how it suppresses sensitivity to anomaly-free patterns and improves localization accuracy. The MMRD approach is effective in detecting anomalies that are invisible in RGB images by integrating auxiliary modalities such as depth and surface normals.This paper introduces a novel Multi-Modal Reverse Distillation (MMRD) paradigm for multi-modal anomaly detection. The MMRD framework consists of a frozen multi-modal teacher encoder that generates distillation targets and a learnable student decoder that restores multi-modal representations. The teacher extracts complementary visual features from different modalities using a siamese architecture and fuses them parameter-free to form distillation targets. The student learns modality-related priors from normal training data and interacts with them to form multi-modal representations for target reconstruction. Extensive experiments show that MMRD outperforms recent state-of-the-art methods on both anomaly detection and localization on the MVTec-3D AD and Eyecandies benchmarks. The MMRD approach is flexible, handling images, depth, and surface normals, and is generalizable to another distillation paradigm, forward distillation. The main contributions include the development of a novel reverse distillation paradigm, the design of a frozen multi-modal teacher encoder, the design of a learnable multi-modal student decoder, and the achievement of state-of-the-art results on two multi-modal anomaly detection benchmarks. The method is evaluated on two multi-modal benchmarks, MVTec 3D-AD and Eyecandies, and shows superior performance in anomaly detection and localization. The MMRD approach is also compared with several state-of-the-art multi-modal detectors, including AST, M3DM, PatchCore, and Eyecandy, and outperforms them in four out of five AD metrics. The method is efficient, with less training time and faster inference speed compared to other methods. The MMRD approach is also analyzed through ablation studies, showing that the integration of auxiliary modalities and multi-modal interaction significantly improves performance. The method is visualized and analyzed to show how it suppresses sensitivity to anomaly-free patterns and improves localization accuracy. The MMRD approach is effective in detecting anomalies that are invisible in RGB images by integrating auxiliary modalities such as depth and surface normals.
Reach us at info@study.space