OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

7 Mar 2024 | Yuhao Xu, Tao Gu, Weifeng Chen, and Chengcai Chen
OOTDiffusion is a novel latent diffusion model (LDM) for image-based virtual try-on (VTON), offering realistic and controllable results. The model leverages pretrained LDMs to learn garment details in the latent space through an outfitting UNet. It introduces outfitting fusion to align garment features with the target human body in the self-attention layers of the denoising UNet, eliminating the need for explicit warping. Outfitting dropout is also introduced during training to enable classifier-free guidance, allowing for fine-grained control over garment features. Experiments on the VITON-HD and Dress Code datasets show that OOTDiffusion outperforms existing VTON methods in both realism and controllability, achieving high-quality try-on results for various human and garment images. The model's ability to preserve garment details and adapt to different body types and postures is a key contribution. The method is trained on high-resolution datasets and demonstrates strong performance across different garment categories, including upper-body, lower-body, and dresses. The model also shows good generalization across different datasets, indicating its effectiveness in real-world applications. The results highlight the advantages of OOTDiffusion in generating realistic and controllable virtual try-on images.OOTDiffusion is a novel latent diffusion model (LDM) for image-based virtual try-on (VTON), offering realistic and controllable results. The model leverages pretrained LDMs to learn garment details in the latent space through an outfitting UNet. It introduces outfitting fusion to align garment features with the target human body in the self-attention layers of the denoising UNet, eliminating the need for explicit warping. Outfitting dropout is also introduced during training to enable classifier-free guidance, allowing for fine-grained control over garment features. Experiments on the VITON-HD and Dress Code datasets show that OOTDiffusion outperforms existing VTON methods in both realism and controllability, achieving high-quality try-on results for various human and garment images. The model's ability to preserve garment details and adapt to different body types and postures is a key contribution. The method is trained on high-resolution datasets and demonstrates strong performance across different garment categories, including upper-body, lower-body, and dresses. The model also shows good generalization across different datasets, indicating its effectiveness in real-world applications. The results highlight the advantages of OOTDiffusion in generating realistic and controllable virtual try-on images.
Reach us at info@study.space