DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization

DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization

23 Apr 2024 | Jisu Nam*1, Heesu Kim2, DongJae Lee2, Siyoon Jin1, Seungryong Kim†1, Seunggyu Chang†2
DreamMatcher is a novel plug-in method for text-to-image (T2I) personalization that enhances the appearance of personalized images while preserving the target structure. The method leverages semantic matching to align reference appearances with the target structure, ensuring accurate and consistent results. DreamMatcher is compatible with existing T2I models without requiring additional training or fine-tuning. It addresses the challenge of accurately mimicking the appearance of subjects in T2I personalization, which is often hindered by the lack of spatial expressivity in text embeddings. By reformulating T2I personalization as semantic matching, DreamMatcher replaces target values with reference values aligned by semantic matching, while preserving the structure path of the pre-trained T2I model. A semantic-consistent masking strategy is introduced to isolate the personalized concept from irrelevant regions introduced by the target prompts. DreamMatcher demonstrates significant improvements in complex scenarios and outperforms existing tuning-free plug-in methods and optimization-based models. The method is validated through extensive experiments, showing its effectiveness in generating high-fidelity images aligned with target prompts, even in extreme non-rigid personalization scenarios. The approach is also robust in challenging personalization scenarios, as demonstrated by ablation studies and user evaluations. DreamMatcher provides a flexible and effective solution for T2I personalization, enhancing the expressivity of the self-attention module while maintaining the structure path of the pre-trained model.DreamMatcher is a novel plug-in method for text-to-image (T2I) personalization that enhances the appearance of personalized images while preserving the target structure. The method leverages semantic matching to align reference appearances with the target structure, ensuring accurate and consistent results. DreamMatcher is compatible with existing T2I models without requiring additional training or fine-tuning. It addresses the challenge of accurately mimicking the appearance of subjects in T2I personalization, which is often hindered by the lack of spatial expressivity in text embeddings. By reformulating T2I personalization as semantic matching, DreamMatcher replaces target values with reference values aligned by semantic matching, while preserving the structure path of the pre-trained T2I model. A semantic-consistent masking strategy is introduced to isolate the personalized concept from irrelevant regions introduced by the target prompts. DreamMatcher demonstrates significant improvements in complex scenarios and outperforms existing tuning-free plug-in methods and optimization-based models. The method is validated through extensive experiments, showing its effectiveness in generating high-fidelity images aligned with target prompts, even in extreme non-rigid personalization scenarios. The approach is also robust in challenging personalization scenarios, as demonstrated by ablation studies and user evaluations. DreamMatcher provides a flexible and effective solution for T2I personalization, enhancing the expressivity of the self-attention module while maintaining the structure path of the pre-trained model.
Reach us at info@study.space
Understanding DreamMatcher%3A Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization