3 Jan 2025 | Dingcheng Yang1,*, Yang Bai2,*, Xiaojun Jia3, Yang Liu3, Xiaochun Cao4, Wenjian Yu1,†
This paper explores the multi-modal vulnerability of diffusion models, which are widely used for image generation tasks. The authors visualize both text and image feature spaces embedded by diffusion models and observe significant differences: text features are chaotic, while image features are clustered according to their subjects. This misalignment highlights potential robustness issues in diffusion models. Based on this observation, the authors propose MMP-Attack, a method that leverages multi-modal priors to manipulate the generation results of diffusion models by appending a specific suffix to the original prompt. MMP-Attack aims to induce the model to generate a specific object while eliminating the original object. The method shows superior manipulation capability and efficiency compared to existing approaches. Experiments demonstrate that MMP-Attack achieves high attack success rates on open-source T2I models and exhibits good universality and transferability. The code for MMP-Attack is publicly available.This paper explores the multi-modal vulnerability of diffusion models, which are widely used for image generation tasks. The authors visualize both text and image feature spaces embedded by diffusion models and observe significant differences: text features are chaotic, while image features are clustered according to their subjects. This misalignment highlights potential robustness issues in diffusion models. Based on this observation, the authors propose MMP-Attack, a method that leverages multi-modal priors to manipulate the generation results of diffusion models by appending a specific suffix to the original prompt. MMP-Attack aims to induce the model to generate a specific object while eliminating the original object. The method shows superior manipulation capability and efficiency compared to existing approaches. Experiments demonstrate that MMP-Attack achieves high attack success rates on open-source T2I models and exhibits good universality and transferability. The code for MMP-Attack is publicly available.