July 27-August 1, 2024 | Shengjie Ma, Yanlin Weng, Tianjia Shao, Kun Zhou
This paper introduces a 3D Gaussian blendshape representation for head avatar animation. The method learns a base model of neutral expression and a group of expression blendshapes, each corresponding to a basis expression in classical parametric face models. Both the neutral model and expression blendshapes are represented as 3D Gaussians, which contain properties to depict the avatar appearance. The avatar model of an arbitrary expression can be generated by linearly blending the neutral model and expression blendshapes with expression coefficients. High-fidelity head avatar animations can be synthesized in real time using Gaussian splatting. Compared to state-of-the-art methods, our Gaussian blendshape representation better captures high-frequency details in input video and achieves superior rendering performance. The method is trained from a monocular video, using previous methods to construct mesh blendshapes and distributing Gaussians on the mesh surfaces as initialization. The method jointly optimizes all Gaussian properties, ensuring semantic consistency between Gaussian blendshapes and mesh blendshapes. The method outperforms state-of-the-art methods in synthesizing high-fidelity head avatar animations and achieving faster speeds in animation and rendering. The method is evaluated on the INSTA dataset and a custom dataset, showing superior performance in terms of PSNR, SSIM, and LPIPS. The method also performs well in novel view extrapolation and cross-identity reenactment. The method is efficient, with a rendering speed of 370fps, and has moderate training and memory cost. The method is also effective in capturing fine facial details and recovering eyeball movement. The method is able to handle exaggerated expressions and has potential applications in real-time animation and virtual reality. The method is also effective in modeling mouth interior and hair, which are not covered in traditional mesh blendshapes. The method is able to generate high-fidelity images in real time using Gaussian splatting. The method is also effective in handling side-view rendering and has potential applications in computer graphics and animation. The method is able to capture subtle facial expressions and maintain the personal attributes of the target subject. The method is also effective in handling dynamic expressions and has potential applications in facial reenactment and animation. The method is able to generate realistic head avatars with high fidelity and efficiency. The method is also effective in handling complex expressions and has potential applications in virtual reality and augmented reality. The method is able to generate high-quality images with fine details and realistic appearance. The method is also effective in handling dynamic expressions and has potential applications in facial animation and virtual reality. The method is able to generate realistic head avatars with high fidelity and efficiency. The method is also effective in handling complex expressions and has potential applications in virtual reality and augmented reality. The method is able to generate high-quality images with fine details and realistic appearance.This paper introduces a 3D Gaussian blendshape representation for head avatar animation. The method learns a base model of neutral expression and a group of expression blendshapes, each corresponding to a basis expression in classical parametric face models. Both the neutral model and expression blendshapes are represented as 3D Gaussians, which contain properties to depict the avatar appearance. The avatar model of an arbitrary expression can be generated by linearly blending the neutral model and expression blendshapes with expression coefficients. High-fidelity head avatar animations can be synthesized in real time using Gaussian splatting. Compared to state-of-the-art methods, our Gaussian blendshape representation better captures high-frequency details in input video and achieves superior rendering performance. The method is trained from a monocular video, using previous methods to construct mesh blendshapes and distributing Gaussians on the mesh surfaces as initialization. The method jointly optimizes all Gaussian properties, ensuring semantic consistency between Gaussian blendshapes and mesh blendshapes. The method outperforms state-of-the-art methods in synthesizing high-fidelity head avatar animations and achieving faster speeds in animation and rendering. The method is evaluated on the INSTA dataset and a custom dataset, showing superior performance in terms of PSNR, SSIM, and LPIPS. The method also performs well in novel view extrapolation and cross-identity reenactment. The method is efficient, with a rendering speed of 370fps, and has moderate training and memory cost. The method is also effective in capturing fine facial details and recovering eyeball movement. The method is able to handle exaggerated expressions and has potential applications in real-time animation and virtual reality. The method is also effective in modeling mouth interior and hair, which are not covered in traditional mesh blendshapes. The method is able to generate high-fidelity images in real time using Gaussian splatting. The method is also effective in handling side-view rendering and has potential applications in computer graphics and animation. The method is able to capture subtle facial expressions and maintain the personal attributes of the target subject. The method is also effective in handling dynamic expressions and has potential applications in facial reenactment and animation. The method is able to generate realistic head avatars with high fidelity and efficiency. The method is also effective in handling complex expressions and has potential applications in virtual reality and augmented reality. The method is able to generate high-quality images with fine details and realistic appearance. The method is also effective in handling dynamic expressions and has potential applications in facial animation and virtual reality. The method is able to generate realistic head avatars with high fidelity and efficiency. The method is also effective in handling complex expressions and has potential applications in virtual reality and augmented reality. The method is able to generate high-quality images with fine details and realistic appearance.