3D Gaussian Blendshapes for Head Avatar Animation

3D Gaussian Blendshapes for Head Avatar Animation

July 27-August 1, 2024, Denver, CO, USA | Shengjie Ma, Yanlin Weng, Tianjia Shao, Kun Zhou
The paper introduces a novel 3D Gaussian blendshape representation for animating photorealistic head avatars. The method learns a base head model of neutral expression and a set of expression blendshapes from a monocular video, all represented as 3D Gaussians. These Gaussians contain properties such as position, rotation, and colors, which are used to generate high-fidelity avatar animations in real-time using Gaussian splatting. The key innovation is the semantic consistency between the Gaussian blendshapes and their corresponding mesh blendshapes, achieved through an optimization process that ensures the differences in the Gaussians are consistent with those in the mesh blendshapes. This approach outperforms state-of-the-art methods in terms of capturing high-frequency details and achieving superior rendering performance, with a frame rate of 370fps for 70k Gaussians. The method is evaluated on various datasets and compared with NeRF-based and point-based methods, demonstrating superior results in quality and speed. The paper also discusses limitations and potential future directions, including improving generalization to novel views and handling deformable hair.The paper introduces a novel 3D Gaussian blendshape representation for animating photorealistic head avatars. The method learns a base head model of neutral expression and a set of expression blendshapes from a monocular video, all represented as 3D Gaussians. These Gaussians contain properties such as position, rotation, and colors, which are used to generate high-fidelity avatar animations in real-time using Gaussian splatting. The key innovation is the semantic consistency between the Gaussian blendshapes and their corresponding mesh blendshapes, achieved through an optimization process that ensures the differences in the Gaussians are consistent with those in the mesh blendshapes. This approach outperforms state-of-the-art methods in terms of capturing high-frequency details and achieving superior rendering performance, with a frame rate of 370fps for 70k Gaussians. The method is evaluated on various datasets and compared with NeRF-based and point-based methods, demonstrating superior results in quality and speed. The paper also discusses limitations and potential future directions, including improving generalization to novel views and handling deformable hair.
Reach us at info@study.space