Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention

Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention

17 Mar 2024 | Jie Ren, Yaxin Li, Shenglai Zeng, Han Xu, Lingjuan Lyu, Yue Xing, and Jiliang Tang
This paper investigates the phenomenon of memorization in text-to-image diffusion models and proposes a novel approach to detect and mitigate it through cross-attention analysis. The authors find that diffusion models tend to memorize training data, which poses risks of copyright infringement and privacy issues. They analyze the cross-attention mechanism and discover that memorization occurs when the model disproportionately focuses on specific token embeddings, leading to overfitting. They propose a method to detect memorization by quantifying attention behavior and a mitigation strategy that adjusts attention dispersion without affecting generation quality or speed. The detection method uses entropy metrics to distinguish between memorized and non-memorized samples, while the mitigation method reduces the attention on trigger tokens and increases the attention on the beginning token. Experiments show that their methods effectively reduce memorization without compromising image quality or generation speed. The study highlights the importance of understanding cross-attention mechanisms in diffusion models to address memorization issues.This paper investigates the phenomenon of memorization in text-to-image diffusion models and proposes a novel approach to detect and mitigate it through cross-attention analysis. The authors find that diffusion models tend to memorize training data, which poses risks of copyright infringement and privacy issues. They analyze the cross-attention mechanism and discover that memorization occurs when the model disproportionately focuses on specific token embeddings, leading to overfitting. They propose a method to detect memorization by quantifying attention behavior and a mitigation strategy that adjusts attention dispersion without affecting generation quality or speed. The detection method uses entropy metrics to distinguish between memorized and non-memorized samples, while the mitigation method reduces the attention on trigger tokens and increases the attention on the beginning token. Experiments show that their methods effectively reduce memorization without compromising image quality or generation speed. The study highlights the importance of understanding cross-attention mechanisms in diffusion models to address memorization issues.
Reach us at info@study.space
[slides and audio] Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention