7 Mar 2024 | Yuhao Xu, Tao Gu, Weifeng Chen, and Chengcai Chen
OOTDiffusion is a novel network architecture designed for realistic and controllable image-based virtual try-on (VTON). It leverages pretrained latent diffusion models to learn garment detail features using an outfitting UNet, which is integrated into the denoising UNet through outfitting fusion. This process aligns garment features with the target human body without the need for a redundant warping step. To enhance controllability, outfitting dropout is introduced during training, allowing classifier-free guidance by adjusting the strength of garment features. Extensive experiments on the VITON-HD and Dress Code datasets demonstrate that OOTDiffusion outperforms other VTON methods in both realism and controllability, showcasing its effectiveness and potential in virtual try-on applications.OOTDiffusion is a novel network architecture designed for realistic and controllable image-based virtual try-on (VTON). It leverages pretrained latent diffusion models to learn garment detail features using an outfitting UNet, which is integrated into the denoising UNet through outfitting fusion. This process aligns garment features with the target human body without the need for a redundant warping step. To enhance controllability, outfitting dropout is introduced during training, allowing classifier-free guidance by adjusting the strength of garment features. Extensive experiments on the VITON-HD and Dress Code datasets demonstrate that OOTDiffusion outperforms other VTON methods in both realism and controllability, showcasing its effectiveness and potential in virtual try-on applications.