May 29, 2024 | Ruchika Chavhan, Da Li, Timothy Hospedales
ConceptPrune is a training-free method for concept editing in diffusion models, enabling the efficient removal of undesirable concepts such as artistic styles, nudity, object erasure, and gender bias. The method identifies "skilled neurons" in diffusion models that are responsible for generating specific concepts and prunes them to remove the concept. By targeting these neurons, ConceptPrune achieves effective concept erasure with minimal weight pruning (approximately 0.12% of total weights), enabling multi-concept erasure and robustness against both white-box and black-box adversarial attacks. Experiments across various concepts demonstrate that ConceptPrune outperforms existing methods in erasing concepts while maintaining the model's image generation capabilities. The method is efficient, does not require fine-tuning, and is resistant to adversarial attacks, making it a promising solution for addressing the risks associated with large-scale text-to-image diffusion models.ConceptPrune is a training-free method for concept editing in diffusion models, enabling the efficient removal of undesirable concepts such as artistic styles, nudity, object erasure, and gender bias. The method identifies "skilled neurons" in diffusion models that are responsible for generating specific concepts and prunes them to remove the concept. By targeting these neurons, ConceptPrune achieves effective concept erasure with minimal weight pruning (approximately 0.12% of total weights), enabling multi-concept erasure and robustness against both white-box and black-box adversarial attacks. Experiments across various concepts demonstrate that ConceptPrune outperforms existing methods in erasing concepts while maintaining the model's image generation capabilities. The method is efficient, does not require fine-tuning, and is resistant to adversarial attacks, making it a promising solution for addressing the risks associated with large-scale text-to-image diffusion models.