HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting

HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting

21 Dec 2024 | Zhenglin Zhou, Fan Ma, Hehe Fan, Zongxin Yang, and Yi Yang
HeadStudio is a novel framework that generates realistic and animatable 3D head avatars from text prompts using 3D Gaussian splatting. The method combines an animatable head prior model with 3D Gaussian splatting to enable semantic animation on high-quality 3D representations. It enhances the optimization process through initialization, distillation, and regularization to jointly learn shape, texture, and animation. The framework produces high-quality, real-time (≥40 fps) avatars that can be smoothly driven by speech and video. Extensive experiments show that HeadStudio outperforms existing methods in generating dynamic avatars from text, with high fidelity and real-time rendering capabilities. The avatars can be rendered at 1024 resolution and are capable of semantic alignment and smooth expression deformation. The method also introduces adaptive geometry regularization to improve texture representation and rendering efficiency. HeadStudio is simple, efficient, and effective, requiring only 2 hours of training on a single NVIDIA A6000 GPU. The framework has potential applications in augmented and virtual reality, as well as in text-to-3D generation and animation.HeadStudio is a novel framework that generates realistic and animatable 3D head avatars from text prompts using 3D Gaussian splatting. The method combines an animatable head prior model with 3D Gaussian splatting to enable semantic animation on high-quality 3D representations. It enhances the optimization process through initialization, distillation, and regularization to jointly learn shape, texture, and animation. The framework produces high-quality, real-time (≥40 fps) avatars that can be smoothly driven by speech and video. Extensive experiments show that HeadStudio outperforms existing methods in generating dynamic avatars from text, with high fidelity and real-time rendering capabilities. The avatars can be rendered at 1024 resolution and are capable of semantic alignment and smooth expression deformation. The method also introduces adaptive geometry regularization to improve texture representation and rendering efficiency. HeadStudio is simple, efficient, and effective, requiring only 2 hours of training on a single NVIDIA A6000 GPU. The framework has potential applications in augmented and virtual reality, as well as in text-to-3D generation and animation.
Reach us at info@study.space
Understanding HeadStudio%3A Text to Animatable Head Avatars with 3D Gaussian Splatting