Understanding Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention

The paper "Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention" by Jie Ren et al. explores the issue of memorization in text-to-image diffusion models, which can lead to copyright infringement and privacy risks. The authors focus on the relationship between cross-attention mechanisms and memorization, observing that cross-attention tends to focus disproportionately on specific token embeddings during memorization. They identify three key findings: 1. **Concentrated Attention on Trigger Tokens**: Memorized samples tend to allocate most attention to specific token embeddings, leading to higher entropy compared to non-memorized samples. 2. **Different Types of Memorization Focus on Different Tokens**: Matching memorization (MM) focuses more on summary tokens, while retrieval memorization (RM) and template memorization (TM) share similar tokens, leading to slower reduction of summary token attention scores. 3. **Concentration in Certain U-Net Layers**: Different U-Net layers exhibit varying levels of concentration on trigger tokens, with some layers showing clearer separation between memorized and non-memorized samples. Based on these findings, the authors propose detection and mitigation methods that do not compromise the speed of training or inference while preserving image generation quality. The detection methods use metrics based on attention entropy and token attention patterns, while the mitigation methods adjust the attention weights during inference and remove high-entropy samples during training. Experimental results demonstrate the effectiveness of these methods in reducing memorization without sacrificing generation quality.The paper "Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention" by Jie Ren et al. explores the issue of memorization in text-to-image diffusion models, which can lead to copyright infringement and privacy risks. The authors focus on the relationship between cross-attention mechanisms and memorization, observing that cross-attention tends to focus disproportionately on specific token embeddings during memorization. They identify three key findings: 1. **Concentrated Attention on Trigger Tokens**: Memorized samples tend to allocate most attention to specific token embeddings, leading to higher entropy compared to non-memorized samples. 2. **Different Types of Memorization Focus on Different Tokens**: Matching memorization (MM) focuses more on summary tokens, while retrieval memorization (RM) and template memorization (TM) share similar tokens, leading to slower reduction of summary token attention scores. 3. **Concentration in Certain U-Net Layers**: Different U-Net layers exhibit varying levels of concentration on trigger tokens, with some layers showing clearer separation between memorized and non-memorized samples. Based on these findings, the authors propose detection and mitigation methods that do not compromise the speed of training or inference while preserving image generation quality. The detection methods use metrics based on attention entropy and token attention patterns, while the mitigation methods adjust the attention weights during inference and remove high-entropy samples during training. Experimental results demonstrate the effectiveness of these methods in reducing memorization without sacrificing generation quality.

Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention

17 Mar 2024 | Jie Ren, Yixin Li, Shenglai Zeng, Han Xu, Lingjuan Lyu, Yue Xing, and Jiliang Tang