Understanding ConceptPrune%3A Concept Editing in Diffusion Models via Skilled Neuron Pruning

The paper "ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning" addresses the challenges of generating unsafe content, copyright violations, and societal biases in large-scale text-to-image diffusion models. It introduces a training-free method called *ConceptPrune* that identifies and prunes critical regions within pre-trained models responsible for generating undesirable concepts. This approach involves identifying "skilled neurons" in the feed-forward layers of diffusion models, which are neurons strongly activated when a specific concept is present. By pruning these skilled neurons, the method effectively removes target concepts while maintaining the model's image-generation capabilities and robustness against adversarial attacks. The paper demonstrates the effectiveness of ConceptPrune through experiments on various concepts, including artistic styles, nudity, object erasure, and gender debiasing. It shows that a small fraction (approximately 0.12%) of total weights can be pruned to achieve efficient concept removal. The method is also evaluated for its robustness against white-box and black-box adversarial attacks, demonstrating strong performance and resistance to adversarial defenses. The authors compare ConceptPrune with existing concept editing and unlearning methods, highlighting its advantages in terms of efficiency, effectiveness, and robustness. The paper concludes by discussing the limitations and potential applications of ConceptPrune in making diffusion models more socially responsible.The paper "ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning" addresses the challenges of generating unsafe content, copyright violations, and societal biases in large-scale text-to-image diffusion models. It introduces a training-free method called *ConceptPrune* that identifies and prunes critical regions within pre-trained models responsible for generating undesirable concepts. This approach involves identifying "skilled neurons" in the feed-forward layers of diffusion models, which are neurons strongly activated when a specific concept is present. By pruning these skilled neurons, the method effectively removes target concepts while maintaining the model's image-generation capabilities and robustness against adversarial attacks. The paper demonstrates the effectiveness of ConceptPrune through experiments on various concepts, including artistic styles, nudity, object erasure, and gender debiasing. It shows that a small fraction (approximately 0.12%) of total weights can be pruned to achieve efficient concept removal. The method is also evaluated for its robustness against white-box and black-box adversarial attacks, demonstrating strong performance and resistance to adversarial defenses. The authors compare ConceptPrune with existing concept editing and unlearning methods, highlighting its advantages in terms of efficiency, effectiveness, and robustness. The paper concludes by discussing the limitations and potential applications of ConceptPrune in making diffusion models more socially responsible.

ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning

29 May 2024 | Ruchika Chavhan, Da Li, Timothy Hospedales