V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation

V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation

4 Jun 2024 | Cong Wang, Kuan Tian, Jun Zhang, Yonghang Guan, Feng Luo, Fei Shen, Zhiwei Jiang, Qing Gu, Xiao Han, Wei Yang
V-Express is a novel method designed to address the challenge of balancing control signals of varying strengths in portrait video generation. The method leverages a Latent Diffusion Model (LDM) and incorporates three key modules: ReferenceNet, V-Kps Guider, and Audio Projection, to handle reference images, V-Kps images, and audio signals, respectively. V-Express employs a progressive training strategy with conditional dropout to enable effective control by weaker conditions, such as audio, while maintaining the influence of stronger signals like facial pose and reference images. Experimental results demonstrate that V-Express can generate high-quality portrait videos with synchronized audio, maintaining consistency in facial identity and pose. The method also provides a solution for the simultaneous and effective use of conditions of varying strengths, enhancing the overall quality of generated videos. Future work aims to improve multilingual support, reduce computational burden, and enable explicit control of facial attributes.V-Express is a novel method designed to address the challenge of balancing control signals of varying strengths in portrait video generation. The method leverages a Latent Diffusion Model (LDM) and incorporates three key modules: ReferenceNet, V-Kps Guider, and Audio Projection, to handle reference images, V-Kps images, and audio signals, respectively. V-Express employs a progressive training strategy with conditional dropout to enable effective control by weaker conditions, such as audio, while maintaining the influence of stronger signals like facial pose and reference images. Experimental results demonstrate that V-Express can generate high-quality portrait videos with synchronized audio, maintaining consistency in facial identity and pose. The method also provides a solution for the simultaneous and effective use of conditions of varying strengths, enhancing the overall quality of generated videos. Future work aims to improve multilingual support, reduce computational burden, and enable explicit control of facial attributes.
Reach us at info@study.space
[slides] V-Express%3A Conditional Dropout for Progressive Training of Portrait Video Generation | StudySpace