DEFENSIVE UNLEARNING WITH ADVERSARIAL TRAINING FOR ROBUST CONCEPT ERASURE IN DIFFUSION MODELS

DEFENSIVE UNLEARNING WITH ADVERSARIAL TRAINING FOR ROBUST CONCEPT ERASURE IN DIFFUSION MODELS

14 Jun 2024 | Yimeng Zhang, Xin Chen, Jinghan Jia, Yihua Zhang, Chongyu Fan, Jiancheng Liu, Mingyi Hong, Ke Ding, Sijia Liu
This paper introduces AdvUnlearn, a robust unlearning framework for diffusion models (DMs) that integrates adversarial training (AT) to enhance the effectiveness of concept erasure against adversarial prompt attacks. The proposed method addresses the challenge of maintaining image generation quality while ensuring robustness in unlearning. The key contributions include the development of a bi-level optimization (BLO)-based integration scheme, the identification of the text encoder as a more suitable module for robustification compared to the UNet, and the design of a utility-retaining regularization to balance the trade-off between concept erasure robustness and model utility. AdvUnlearn demonstrates significant improvements in robustness and utility across various unlearning scenarios, including nudity, object, and style concept erasure. The text encoder component of AdvUnlearn can be used as a plug-and-play robust unlearner for different DM types, enhancing its applicability. The framework also achieves a balanced trade-off between robustness and model utility, making it a promising solution for improving the safety and effectiveness of DMs in real-world applications.This paper introduces AdvUnlearn, a robust unlearning framework for diffusion models (DMs) that integrates adversarial training (AT) to enhance the effectiveness of concept erasure against adversarial prompt attacks. The proposed method addresses the challenge of maintaining image generation quality while ensuring robustness in unlearning. The key contributions include the development of a bi-level optimization (BLO)-based integration scheme, the identification of the text encoder as a more suitable module for robustification compared to the UNet, and the design of a utility-retaining regularization to balance the trade-off between concept erasure robustness and model utility. AdvUnlearn demonstrates significant improvements in robustness and utility across various unlearning scenarios, including nudity, object, and style concept erasure. The text encoder component of AdvUnlearn can be used as a plug-and-play robust unlearner for different DM types, enhancing its applicability. The framework also achieves a balanced trade-off between robustness and model utility, making it a promising solution for improving the safety and effectiveness of DMs in real-world applications.
Reach us at info@study.space