R.A.C.E. : Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model

R.A.C.E. : Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model

23 Jul 2024 | Changhoon Kim, Kyle Min, Yezhou Yang
RACE (Robust Adversarial Concept Erase) is a novel approach designed to enhance the robustness of concept erasure in text-to-image (T2I) diffusion models against adversarial attacks. The method introduces an adversarial training framework to identify and mitigate adversarial text embeddings that can reconstruct erased concepts, significantly reducing the Attack Success Rate (ASR). RACE achieves a 30 percentage point reduction in ASR for the "nudity" concept against the leading white-box attack method. Evaluations show that RACE effectively defends against both white-box and black-box attacks, marking a significant advancement in protecting T2I diffusion models from generating inappropriate or misleading imagery. RACE's method is efficient and integrates seamlessly into the concept erasure workflow, enhancing the resilience of T2I models against adversarial manipulations. The approach is validated through extensive experiments, demonstrating its effectiveness in reducing ASR across various target concepts, including artistic, explicit, and object categories. RACE also addresses the trade-off between robustness and image quality, offering a computationally efficient defense mechanism for T2I diffusion models against adversarial threats.RACE (Robust Adversarial Concept Erase) is a novel approach designed to enhance the robustness of concept erasure in text-to-image (T2I) diffusion models against adversarial attacks. The method introduces an adversarial training framework to identify and mitigate adversarial text embeddings that can reconstruct erased concepts, significantly reducing the Attack Success Rate (ASR). RACE achieves a 30 percentage point reduction in ASR for the "nudity" concept against the leading white-box attack method. Evaluations show that RACE effectively defends against both white-box and black-box attacks, marking a significant advancement in protecting T2I diffusion models from generating inappropriate or misleading imagery. RACE's method is efficient and integrates seamlessly into the concept erasure workflow, enhancing the resilience of T2I models against adversarial manipulations. The approach is validated through extensive experiments, demonstrating its effectiveness in reducing ASR across various target concepts, including artistic, explicit, and object categories. RACE also addresses the trade-off between robustness and image quality, offering a computationally efficient defense mechanism for T2I diffusion models against adversarial threats.
Reach us at info@study.space