October 28-November 1, 2024 | Weifeng Chen, Tao Gu, Yuhao Xu, Arlene Chen
Magic Clothing is a latent diffusion model (LDM)-based network architecture for garment-driven image synthesis. The goal is to generate characters wearing target garments based on text prompts, with the key challenge being to preserve garment details while maintaining faithfulness to the text. The method introduces a garment extractor that captures detailed garment features and incorporates them into the pretrained LDMs via self-attention fusion. This ensures that garment details remain unchanged on the target character. The joint classifier-free guidance balances the control of garment features and text prompts, enabling the model to generate photorealistic and anime-style images. The garment extractor is a plug-in module compatible with various fine-tuned LDMs and extensions like ControlNet and IP-Adapter, enhancing the diversity and controllability of generated characters. A robust metric, Matched-Points-LPIPS (MP-LPIPS), is proposed to evaluate the consistency of the target image to the source garment. Extensive experiments show that Magic Clothing achieves state-of-the-art results in garment-driven image synthesis under various conditional controls. The source code is available at https://github.com/ShineChen1024/MagicClothing.Magic Clothing is a latent diffusion model (LDM)-based network architecture for garment-driven image synthesis. The goal is to generate characters wearing target garments based on text prompts, with the key challenge being to preserve garment details while maintaining faithfulness to the text. The method introduces a garment extractor that captures detailed garment features and incorporates them into the pretrained LDMs via self-attention fusion. This ensures that garment details remain unchanged on the target character. The joint classifier-free guidance balances the control of garment features and text prompts, enabling the model to generate photorealistic and anime-style images. The garment extractor is a plug-in module compatible with various fine-tuned LDMs and extensions like ControlNet and IP-Adapter, enhancing the diversity and controllability of generated characters. A robust metric, Matched-Points-LPIPS (MP-LPIPS), is proposed to evaluate the consistency of the target image to the source garment. Extensive experiments show that Magic Clothing achieves state-of-the-art results in garment-driven image synthesis under various conditional controls. The source code is available at https://github.com/ShineChen1024/MagicClothing.