IMAGDressing-v1: Customizable Virtual Dressing

IMAGDressing-v1: Customizable Virtual Dressing

6 Aug 2024 | Fei Shen, Xin Jiang, Xin He, Hu Ye, Cong Wang, Xiaoyu Du, Zechao Li, Jinhui Tang
IMAGDressing-v1 is a novel virtual dressing (VD) framework designed to generate editable human images with fixed garments and optional conditions, such as faces, poses, and scenes. The framework addresses the limitations of existing virtual try-on (VTON) technologies, which primarily focus on consumer scenarios and lack the flexibility to showcase garments comprehensively. IMAGDressing-v1 incorporates a garment UNet that captures semantic and texture features from CLIP and VAE, respectively, and a denoising UNet with a hybrid attention module to integrate these features with text prompts. This hybrid attention module includes frozen self-attention and trainable cross-attention, allowing users to control different scenes through text. The framework can be combined with extensions like ControlNet and IP-Adapter to enhance the diversity and controllability of generated images. To address data scarcity, the authors released the IGPair dataset, containing over 300,000 pairs of clothing and dressed images, and established a standard pipeline for data assembly. Extensive experiments demonstrate that IMAGDressing-v1 achieves state-of-the-art performance in controlled human image synthesis, outperforming other methods in various evaluation metrics.IMAGDressing-v1 is a novel virtual dressing (VD) framework designed to generate editable human images with fixed garments and optional conditions, such as faces, poses, and scenes. The framework addresses the limitations of existing virtual try-on (VTON) technologies, which primarily focus on consumer scenarios and lack the flexibility to showcase garments comprehensively. IMAGDressing-v1 incorporates a garment UNet that captures semantic and texture features from CLIP and VAE, respectively, and a denoising UNet with a hybrid attention module to integrate these features with text prompts. This hybrid attention module includes frozen self-attention and trainable cross-attention, allowing users to control different scenes through text. The framework can be combined with extensions like ControlNet and IP-Adapter to enhance the diversity and controllability of generated images. To address data scarcity, the authors released the IGPair dataset, containing over 300,000 pairs of clothing and dressed images, and established a standard pipeline for data assembly. Extensive experiments demonstrate that IMAGDressing-v1 achieves state-of-the-art performance in controlled human image synthesis, outperforming other methods in various evaluation metrics.
Reach us at info@study.space