23 Mar 2024 | Zhenhui Ye *†♡ Tianyun Zhong *†♡ Yi Ren ♡ Jiaqi Yang †♡ Weichuang Li ♢ Jiawei Huang ♠♡ Ziyoue Jiang ♠♡ Jinzheng He †♡ Rongjie Huang ♠ Jinglin Liu ♡ Chen Zhang ♡ Xiang Yin ♡ Zejun Ma ♡ Zhou Zhao ♠
Real3D-Portrait is a framework for one-shot realistic 3D talking portrait synthesis. It aims to reconstruct a 3D avatar from an unseen image and animate it using a reference video or audio to generate a talking portrait video. The framework addresses the limitations of existing methods by improving 3D reconstruction and animation capabilities, synthesizing natural torso movement and switchable background, and supporting audio-driven applications. Key contributions include:
1. **Image-to-Plane (I2P) Model**: A large-scale feed-forward network that distills 3D prior knowledge from a 3D face generative model to improve 3D reconstruction accuracy.
2. **Motion Adapter (MA)**: An efficient network that morphs the predicted 3D representation based on input motion conditions, ensuring accurate and stable face animation.
3. **Head-Torso-Background Super-Resolution (HTB-SR) Model**: A model that synthesizes realistic torso movement and switchable background, enhancing the realism of the final video.
4. **Generic Audio-to-Motion (AZM) Model**: A model that transforms raw audio into facial motion representations, supporting both video and audio-driven applications.
Experiments demonstrate that Real3D-Portrait outperforms existing one-shot talking face systems and achieves comparable performance to state-of-the-art person-specific methods. The framework is evaluated on both video-driven and audio-driven tasks, showing superior results in identity preservation, visual quality, and audio-lip synchronization.Real3D-Portrait is a framework for one-shot realistic 3D talking portrait synthesis. It aims to reconstruct a 3D avatar from an unseen image and animate it using a reference video or audio to generate a talking portrait video. The framework addresses the limitations of existing methods by improving 3D reconstruction and animation capabilities, synthesizing natural torso movement and switchable background, and supporting audio-driven applications. Key contributions include:
1. **Image-to-Plane (I2P) Model**: A large-scale feed-forward network that distills 3D prior knowledge from a 3D face generative model to improve 3D reconstruction accuracy.
2. **Motion Adapter (MA)**: An efficient network that morphs the predicted 3D representation based on input motion conditions, ensuring accurate and stable face animation.
3. **Head-Torso-Background Super-Resolution (HTB-SR) Model**: A model that synthesizes realistic torso movement and switchable background, enhancing the realism of the final video.
4. **Generic Audio-to-Motion (AZM) Model**: A model that transforms raw audio into facial motion representations, supporting both video and audio-driven applications.
Experiments demonstrate that Real3D-Portrait outperforms existing one-shot talking face systems and achieves comparable performance to state-of-the-art person-specific methods. The framework is evaluated on both video-driven and audio-driven tasks, showing superior results in identity preservation, visual quality, and audio-lip synchronization.