16 Mar 2024 | Rui Wang1†*, Hailong Guo1†, Jiaming Liu2†, Huaxia Li2, Haibo Zhao2, Xu Tang2, Yao Hu2, Hao Tang3, and Peipei Li1†
StableGarment is a unified framework designed to address garment-centric (GC) generation tasks, including GC text-to-image, controllable GC text-to-image, stylized GC text-to-image, and robust virtual try-on. The main challenge lies in preserving the intricate textures of garments while maintaining the flexibility of pre-trained Stable Diffusion. To achieve this, the framework introduces a garment encoder, which is a trainable copy of the denoising UNet equipped with additive self-attention (ASA) layers. These layers are specifically designed to transfer detailed garment textures and facilitate the integration of stylized base models. Additionally, a dedicated try-on ControlNet is incorporated to enable precise virtual try-on tasks. The framework also includes a novel data engine that generates high-quality synthesized data to enhance the model's ability to follow prompts. Extensive experiments demonstrate that StableGarment outperforms existing methods in terms of state-of-the-art performance and exhibits high flexibility with broad potential applications in various garment-centric image generation tasks.StableGarment is a unified framework designed to address garment-centric (GC) generation tasks, including GC text-to-image, controllable GC text-to-image, stylized GC text-to-image, and robust virtual try-on. The main challenge lies in preserving the intricate textures of garments while maintaining the flexibility of pre-trained Stable Diffusion. To achieve this, the framework introduces a garment encoder, which is a trainable copy of the denoising UNet equipped with additive self-attention (ASA) layers. These layers are specifically designed to transfer detailed garment textures and facilitate the integration of stylized base models. Additionally, a dedicated try-on ControlNet is incorporated to enable precise virtual try-on tasks. The framework also includes a novel data engine that generates high-quality synthesized data to enhance the model's ability to follow prompts. Extensive experiments demonstrate that StableGarment outperforms existing methods in terms of state-of-the-art performance and exhibits high flexibility with broad potential applications in various garment-centric image generation tasks.