REAL3D-PORTAIT: ONE-SHOT REALISTIC 3D TALKING PORTRAIT SYNTHESIS

REAL3D-PORTAIT: ONE-SHOT REALISTIC 3D TALKING PORTRAIT SYNTHESIS

2024 | Zhenhui Ye, Tianyun Zhong, Yi Ren, Jiaqi Yang, Weichuang Li, Jiawei Huang, Ziyue Jiang, Jinzheng He, Rongjie Huang, Jinglin Liu, Chen Zhang, Xiang Yin, Zejun Ma, Zhou Zhao
Real3D-Portrait is a one-shot 3D talking portrait synthesis framework that reconstructs a 3D avatar from an unseen image and animates it with a reference video or audio to generate a talking portrait video. The framework addresses the limitations of existing methods by improving 3D reconstruction and animation power, achieving natural torso movement and switchable background rendering, and supporting audio-driven applications. It consists of four main components: a large image-to-plane (I2P) model for 3D face reconstruction, a motion adapter for animation, a head-torso-background super-resolution (HTB-SR) model for realistic video synthesis, and a generic audio-to-motion (A2M) model for audio-driven applications. The I2P model is pre-trained on a multi-view image dataset to distill 3D prior knowledge from a 3D face generative model. The motion adapter morphs the predicted 3D representation based on the input condition. The HTB-SR model synthesizes the final image with natural torso movement and switchable background. The A2M model transforms audio signals into motion representations. Extensive experiments show that Real3D-Portrait outperforms existing one-shot talking face systems and achieves comparable performance to state-of-the-art person-specific methods. The framework supports both video and audio-driven applications, making it the first one-shot 3D face system that supports both scenarios. The method is evaluated on various metrics, including image quality, identity preservation, and audio-lip synchronization, demonstrating its effectiveness. The framework is also evaluated qualitatively, showing the ability to generate realistic talking portraits with natural movements and switchable backgrounds. The method is further validated through ablation studies, showing the importance of each component in achieving the desired results. Overall, Real3D-Portrait provides a comprehensive solution for one-shot 3D talking portrait synthesis, achieving high-quality results in both video and audio-driven scenarios.Real3D-Portrait is a one-shot 3D talking portrait synthesis framework that reconstructs a 3D avatar from an unseen image and animates it with a reference video or audio to generate a talking portrait video. The framework addresses the limitations of existing methods by improving 3D reconstruction and animation power, achieving natural torso movement and switchable background rendering, and supporting audio-driven applications. It consists of four main components: a large image-to-plane (I2P) model for 3D face reconstruction, a motion adapter for animation, a head-torso-background super-resolution (HTB-SR) model for realistic video synthesis, and a generic audio-to-motion (A2M) model for audio-driven applications. The I2P model is pre-trained on a multi-view image dataset to distill 3D prior knowledge from a 3D face generative model. The motion adapter morphs the predicted 3D representation based on the input condition. The HTB-SR model synthesizes the final image with natural torso movement and switchable background. The A2M model transforms audio signals into motion representations. Extensive experiments show that Real3D-Portrait outperforms existing one-shot talking face systems and achieves comparable performance to state-of-the-art person-specific methods. The framework supports both video and audio-driven applications, making it the first one-shot 3D face system that supports both scenarios. The method is evaluated on various metrics, including image quality, identity preservation, and audio-lip synchronization, demonstrating its effectiveness. The framework is also evaluated qualitatively, showing the ability to generate realistic talking portraits with natural movements and switchable backgrounds. The method is further validated through ablation studies, showing the importance of each component in achieving the desired results. Overall, Real3D-Portrait provides a comprehensive solution for one-shot 3D talking portrait synthesis, achieving high-quality results in both video and audio-driven scenarios.
Reach us at info@study.space
Understanding Real3D-Portrait%3A One-shot Realistic 3D Talking Portrait Synthesis