DisDet: Exploring Detectability of Backdoor Attack on Diffusion Models

DisDet: Exploring Detectability of Backdoor Attack on Diffusion Models

5 Feb 2024 | Yang Sui, Huy Phan, Jinqi Xiao, Tianfang Zhang, Zijie Tang, Cong Shi, Yan Wang, Yingying Chen, Bo Yuan
This paper explores the detectability of backdoor attacks on diffusion models, proposing a distribution discrepancy-based trigger detection mechanism and a detection-evading trigger design. The study analyzes the properties of trigger patterns in existing diffusion backdoor attacks, revealing the role of distribution discrepancy in Trojan detection. A low-cost trigger detection mechanism is developed to identify poisoned input noise. The paper also proposes a backdoor attack strategy that can learn stealthy triggers to evade the detection scheme. Empirical evaluations across various diffusion models and datasets demonstrate the effectiveness of the proposed trigger detection and detection-evading attack strategies. The distribution discrepancy-based detection method achieves a 100% detection rate for existing Trojan triggers, while the detection-evading trigger design enables nearly 100% detection pass rate with high attack and benign performance. The paper also presents a two-step training procedure for learning detection-evading triggers and backdoored diffusion models, incorporating noise consistency optimization to improve performance. The results show that the proposed methods significantly enhance the stealthiness of backdoor triggers and improve both benign and attack performance. The study highlights the importance of understanding the detectability of backdoor attacks on diffusion models and provides a systematic approach to address this challenge.This paper explores the detectability of backdoor attacks on diffusion models, proposing a distribution discrepancy-based trigger detection mechanism and a detection-evading trigger design. The study analyzes the properties of trigger patterns in existing diffusion backdoor attacks, revealing the role of distribution discrepancy in Trojan detection. A low-cost trigger detection mechanism is developed to identify poisoned input noise. The paper also proposes a backdoor attack strategy that can learn stealthy triggers to evade the detection scheme. Empirical evaluations across various diffusion models and datasets demonstrate the effectiveness of the proposed trigger detection and detection-evading attack strategies. The distribution discrepancy-based detection method achieves a 100% detection rate for existing Trojan triggers, while the detection-evading trigger design enables nearly 100% detection pass rate with high attack and benign performance. The paper also presents a two-step training procedure for learning detection-evading triggers and backdoored diffusion models, incorporating noise consistency optimization to improve performance. The results show that the proposed methods significantly enhance the stealthiness of backdoor triggers and improve both benign and attack performance. The study highlights the importance of understanding the detectability of backdoor attacks on diffusion models and provides a systematic approach to address this challenge.
Reach us at info@study.space
Understanding DisDet%3A Exploring Detectability of Backdoor Attack on Diffusion Models