GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting

GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting

October 28–November 01, 2024, Melbourne, Australia | Hongyun Yu, Zhan Qu, Qihang Yu, Jianchuan Chen, Zhonghua Jiang, Zhiwen Chen, Shengyu Zhang, Jimin Xu, Fei Wu, Chengfei Lv, Gang Yu
GaussianTalker is a novel method for audio-driven talking head synthesis using 3D Gaussian Splatting. The method addresses the limitations of existing techniques, such as unsynchronized or unnatural lip movements and visual artifacts, by utilizing 3D Gaussians for explicit representation and intuitive control of facial motion. GaussianTalker consists of two main modules: the Speaker-specific Motion Translator and the Dynamic Gaussian Renderer. The Speaker-specific Motion Translator achieves accurate lip movements by decoupling audio features from speaker identity and generating personalized embeddings. The Dynamic Gaussian Renderer enhances facial detail representation by incorporating Speaker-specific BlendShapes and refining Gaussian attributes through latent pose. The method produces high-quality videos with precise lip synchronization and realistic visual effects, achieving a rendering speed of 130 FPS on an NVIDIA RTX4090 GPU, which exceeds real-time performance thresholds. Extensive experiments show that GaussianTalker outperforms state-of-the-art methods in talking head synthesis, demonstrating superior performance in both quantitative and qualitative evaluations. The method is adaptable to multiple languages and audio timbres, and its framework is capable of being deployed on various hardware platforms. The paper also presents ablation studies to validate the effectiveness of each component, showing that the integration of 3D Gaussian Splatting with the FLAME model significantly improves the quality and realism of generated talking head videos.GaussianTalker is a novel method for audio-driven talking head synthesis using 3D Gaussian Splatting. The method addresses the limitations of existing techniques, such as unsynchronized or unnatural lip movements and visual artifacts, by utilizing 3D Gaussians for explicit representation and intuitive control of facial motion. GaussianTalker consists of two main modules: the Speaker-specific Motion Translator and the Dynamic Gaussian Renderer. The Speaker-specific Motion Translator achieves accurate lip movements by decoupling audio features from speaker identity and generating personalized embeddings. The Dynamic Gaussian Renderer enhances facial detail representation by incorporating Speaker-specific BlendShapes and refining Gaussian attributes through latent pose. The method produces high-quality videos with precise lip synchronization and realistic visual effects, achieving a rendering speed of 130 FPS on an NVIDIA RTX4090 GPU, which exceeds real-time performance thresholds. Extensive experiments show that GaussianTalker outperforms state-of-the-art methods in talking head synthesis, demonstrating superior performance in both quantitative and qualitative evaluations. The method is adaptable to multiple languages and audio timbres, and its framework is capable of being deployed on various hardware platforms. The paper also presents ablation studies to validate the effectiveness of each component, showing that the integration of 3D Gaussian Splatting with the FLAME model significantly improves the quality and realism of generated talking head videos.
Reach us at info@study.space