2024-03-16 | Rui Wang, Hailong Guo, Jiaming Liu, Huaxia Li, Haibo Zhao, Xu Tang, Yao Hu, Hao Tang, and Peipei Li
StableGarment is a unified framework for garment-centric generation tasks, including text-to-image generation, controllable text-to-image generation, stylized text-to-image generation, and virtual try-on. The framework is built upon Stable Diffusion and incorporates a garment encoder with additive self-attention (ASA) layers to preserve garment details while enabling flexibility in image creation. A try-on ControlNet is also integrated to support virtual try-on tasks with precision. The framework also includes a data engine to generate high-quality synthesized data, enhancing the model's ability to follow prompts.
The main challenge in garment-centric generation is maintaining the intricate textures of the garment while allowing flexibility in pre-trained Stable Diffusion. The solution involves a garment encoder that captures detailed garment features and integrates with the denoising UNet via ASA layers, which facilitate the transfer of detailed garment textures and the integration of stylized base models. A try-on ControlNet is trained to enable virtual try-on tasks by superimposing garments onto user images. The framework also includes a restructured training dataset enriched with varied text prompts to enhance the model's ability to follow prompts.
Extensive experiments demonstrate that StableGarment achieves state-of-the-art results in virtual try-on tasks and exhibits high flexibility with broad potential applications in garment-centric image generation. The framework is capable of handling various garment-centric tasks, including virtual try-on, and can adapt to different styles and conditions. The model's performance is benchmarked against existing standards, where it demonstrates superior results in preserving garment details and achieving realistic try-on effects. The framework is also evaluated against other methods in terms of quantitative metrics and user studies, showing its effectiveness in garment texture preservation and try-on quality.StableGarment is a unified framework for garment-centric generation tasks, including text-to-image generation, controllable text-to-image generation, stylized text-to-image generation, and virtual try-on. The framework is built upon Stable Diffusion and incorporates a garment encoder with additive self-attention (ASA) layers to preserve garment details while enabling flexibility in image creation. A try-on ControlNet is also integrated to support virtual try-on tasks with precision. The framework also includes a data engine to generate high-quality synthesized data, enhancing the model's ability to follow prompts.
The main challenge in garment-centric generation is maintaining the intricate textures of the garment while allowing flexibility in pre-trained Stable Diffusion. The solution involves a garment encoder that captures detailed garment features and integrates with the denoising UNet via ASA layers, which facilitate the transfer of detailed garment textures and the integration of stylized base models. A try-on ControlNet is trained to enable virtual try-on tasks by superimposing garments onto user images. The framework also includes a restructured training dataset enriched with varied text prompts to enhance the model's ability to follow prompts.
Extensive experiments demonstrate that StableGarment achieves state-of-the-art results in virtual try-on tasks and exhibits high flexibility with broad potential applications in garment-centric image generation. The framework is capable of handling various garment-centric tasks, including virtual try-on, and can adapt to different styles and conditions. The model's performance is benchmarked against existing standards, where it demonstrates superior results in preserving garment details and achieving realistic try-on effects. The framework is also evaluated against other methods in terms of quantitative metrics and user studies, showing its effectiveness in garment texture preservation and try-on quality.