Understanding Pruning for Robust Concept Erasing in Diffusion Models

This paper addresses the issue of text-to-image diffusion models generating undesirable content, such as NSFW images and copyrighted artworks. Existing methods that fine-tune models to erase problematic concepts often lack robustness, as they can be reactivated by cleverly crafted adversarial prompts. To improve this, the authors introduce a pruning-based strategy for concept erasing. They identify concept-correlated neurons that are sensitive to adversarial prompts and selectively prune critical parameters associated with these neurons. This method enhances the robustness of the model against adversarial inputs, achieving significant improvements in concept erasure rates. Experimental results show that the proposed method significantly reduces the sensitivity of concept-related neurons and maintains or improves generation quality for other concepts. The pruning strategy is flexible and can be integrated with existing concept-erasing techniques, making it a robust solution for improving the safety and reliability of diffusion models in real-world applications.This paper addresses the issue of text-to-image diffusion models generating undesirable content, such as NSFW images and copyrighted artworks. Existing methods that fine-tune models to erase problematic concepts often lack robustness, as they can be reactivated by cleverly crafted adversarial prompts. To improve this, the authors introduce a pruning-based strategy for concept erasing. They identify concept-correlated neurons that are sensitive to adversarial prompts and selectively prune critical parameters associated with these neurons. This method enhances the robustness of the model against adversarial inputs, achieving significant improvements in concept erasure rates. Experimental results show that the proposed method significantly reduces the sensitivity of concept-related neurons and maintains or improves generation quality for other concepts. The pruning strategy is flexible and can be integrated with existing concept-erasing techniques, making it a robust solution for improving the safety and reliability of diffusion models in real-world applications.

Pruning for Robust Concept Erasing in Diffusion Models

26 May 2024 | Tianyun Yang, Juan Cao, Chang Xu