TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting

TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting

5 Jul 2024 | Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Xin Ning, Jun Zhou, and Lin Gu
**TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting** **Authors:** Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Xin Ning, Jun Zhou, Lin Gu **Institution:** School of Computer Science and Engineering, State Key Laboratory of Complex & Critical Software Environment, Jiangxi Research Institute, Beihang University; Institute of Semiconductors, Chinese Academy of Sciences; School of Information and Communication Technology, Griffith University; RIKEN AIP; The University of Tokyo **Abstract:** Radiance fields have shown impressive performance in synthesizing lifelike 3D talking heads. However, the challenge of fitting steep appearance changes often leads to distortions in dynamic regions. To address this, TalkingGaussian introduces a deformation-based radiance fields framework for high-fidelity talking head synthesis. By leveraging point-based Gaussian Splatting (3DGS), facial motions are represented by smooth and continuous deformations applied to persistent Gaussian primitives, avoiding the need to learn appearance changes. This approach ensures precise facial motions while maintaining an intact facial structure. The method further decomposes the model into two branches for the face and inside mouth areas, simplifying the learning tasks and improving the synthesis quality. Extensive experiments demonstrate that TalkingGaussian renders high-quality lip-synchronized talking head videos with better facial fidelity and higher efficiency compared to previous methods. **Keywords:** talking head synthesis, 3D Gaussian Splatting **Introduction:** Synthesizing audio-driven talking head videos is valuable for various digital applications. Radiance fields, such as Neural Radiance Fields (NeRF), have been widely used to improve the stability of 3D head structures and provide photorealistic rendering. However, direct modification of point appearance to present facial motions can lead to distortions in dynamic regions. TalkingGaussian addresses this issue by using deformation-based radiance fields, where facial motions are represented by smooth deformations applied to persistent Gaussian primitives. This method simplifies the learning task and improves facial fidelity. **Method:** TalkingGaussian decomposes the model into Persistent Gaussian Fields and Grid-based Motion Fields. The Persistent Gaussian Fields maintain a stable head structure, while the Grid-based Motion Fields represent facial motions through point-wise deformations. An incremental sampling strategy is introduced to facilitate smooth learning of complex facial motions. The method also includes a Face-Mouth Decomposition module to improve the quality of lip synchronization and mouth reconstruction. **Experiments:** TalkingGaussian is evaluated using a self-reconstruction setting and a lip-synchronization setting. The results show that TalkingGaussian outperforms existing methods in terms of image quality, motion quality, and efficiency. Qualitative evaluations and user studies further validate the effectiveness and visual appeal of the synthesized talking heads. **Conclusion:** TalkingGaussian is a novel deformation-based framework for high-quality 3D talking head synthesis. By maintaining a persistent head structure and using deformation to represent facial motions**TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting** **Authors:** Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Xin Ning, Jun Zhou, Lin Gu **Institution:** School of Computer Science and Engineering, State Key Laboratory of Complex & Critical Software Environment, Jiangxi Research Institute, Beihang University; Institute of Semiconductors, Chinese Academy of Sciences; School of Information and Communication Technology, Griffith University; RIKEN AIP; The University of Tokyo **Abstract:** Radiance fields have shown impressive performance in synthesizing lifelike 3D talking heads. However, the challenge of fitting steep appearance changes often leads to distortions in dynamic regions. To address this, TalkingGaussian introduces a deformation-based radiance fields framework for high-fidelity talking head synthesis. By leveraging point-based Gaussian Splatting (3DGS), facial motions are represented by smooth and continuous deformations applied to persistent Gaussian primitives, avoiding the need to learn appearance changes. This approach ensures precise facial motions while maintaining an intact facial structure. The method further decomposes the model into two branches for the face and inside mouth areas, simplifying the learning tasks and improving the synthesis quality. Extensive experiments demonstrate that TalkingGaussian renders high-quality lip-synchronized talking head videos with better facial fidelity and higher efficiency compared to previous methods. **Keywords:** talking head synthesis, 3D Gaussian Splatting **Introduction:** Synthesizing audio-driven talking head videos is valuable for various digital applications. Radiance fields, such as Neural Radiance Fields (NeRF), have been widely used to improve the stability of 3D head structures and provide photorealistic rendering. However, direct modification of point appearance to present facial motions can lead to distortions in dynamic regions. TalkingGaussian addresses this issue by using deformation-based radiance fields, where facial motions are represented by smooth deformations applied to persistent Gaussian primitives. This method simplifies the learning task and improves facial fidelity. **Method:** TalkingGaussian decomposes the model into Persistent Gaussian Fields and Grid-based Motion Fields. The Persistent Gaussian Fields maintain a stable head structure, while the Grid-based Motion Fields represent facial motions through point-wise deformations. An incremental sampling strategy is introduced to facilitate smooth learning of complex facial motions. The method also includes a Face-Mouth Decomposition module to improve the quality of lip synchronization and mouth reconstruction. **Experiments:** TalkingGaussian is evaluated using a self-reconstruction setting and a lip-synchronization setting. The results show that TalkingGaussian outperforms existing methods in terms of image quality, motion quality, and efficiency. Qualitative evaluations and user studies further validate the effectiveness and visual appeal of the synthesized talking heads. **Conclusion:** TalkingGaussian is a novel deformation-based framework for high-quality 3D talking head synthesis. By maintaining a persistent head structure and using deformation to represent facial motions
Reach us at info@study.space
[slides and audio] TalkingGaussian%3A Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting