GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh

GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh

2024 | Jing Wen, Xiaoming Zhao, Zhongzheng Ren, Alexander G. Schwing, Shenlong Wang
GoMAvatar is a novel method for creating efficient, high-fidelity, animatable human models from a single monocular video. It uses a hybrid 3D representation called Gaussians-on-Mesh (GoM), which combines the rendering quality and speed of Gaussian splatting with the geometry modeling and compatibility of deformable meshes. This approach allows for real-time rendering at novel views and poses with state-of-the-art quality, while being memory-efficient (3.63 MB per subject) and efficient (43 FPS). GoMAvatar is seamlessly compatible with graphics pipelines such as OpenGL. The method involves learning a canonical Gaussians-on-Mesh representation from a monocular video, which is then articulated to the observation space to render the human in real-time. The Gaussians-on-Mesh representation is defined by a set of vertices and faces with associated attributes, including rotation, scale, color, and indices of vertices belonging to each face. The rendering process decomposes the RGB image into a pseudo albedo map and a pseudo shading map, which are rendered using Gaussian splatting and mesh rasterization, respectively. GoMAvatar is evaluated on the ZJU-MoCap, PeopleSnapshot, and YouTube video datasets, where it matches or surpasses current monocular human modeling algorithms in rendering quality and significantly outperforms them in computational efficiency. It achieves a rendering speed of 43 FPS on an NVIDIA A100 GPU and is memory-efficient, costing only 3.63 MB per subject. The method is also tested on novel view and pose synthesis, demonstrating its ability to render realistic details and handle challenging poses such as self-penetration. The approach is memory-efficient and fast, making it suitable for real-time applications. It is also capable of handling in-the-wild videos and provides high-quality geometry and rendering. The method is validated through extensive experiments, showing that it outperforms other state-of-the-art methods in terms of rendering quality, inference speed, and geometry accuracy. The results demonstrate that GoMAvatar is a promising approach for creating efficient, high-fidelity, animatable human models from monocular videos.GoMAvatar is a novel method for creating efficient, high-fidelity, animatable human models from a single monocular video. It uses a hybrid 3D representation called Gaussians-on-Mesh (GoM), which combines the rendering quality and speed of Gaussian splatting with the geometry modeling and compatibility of deformable meshes. This approach allows for real-time rendering at novel views and poses with state-of-the-art quality, while being memory-efficient (3.63 MB per subject) and efficient (43 FPS). GoMAvatar is seamlessly compatible with graphics pipelines such as OpenGL. The method involves learning a canonical Gaussians-on-Mesh representation from a monocular video, which is then articulated to the observation space to render the human in real-time. The Gaussians-on-Mesh representation is defined by a set of vertices and faces with associated attributes, including rotation, scale, color, and indices of vertices belonging to each face. The rendering process decomposes the RGB image into a pseudo albedo map and a pseudo shading map, which are rendered using Gaussian splatting and mesh rasterization, respectively. GoMAvatar is evaluated on the ZJU-MoCap, PeopleSnapshot, and YouTube video datasets, where it matches or surpasses current monocular human modeling algorithms in rendering quality and significantly outperforms them in computational efficiency. It achieves a rendering speed of 43 FPS on an NVIDIA A100 GPU and is memory-efficient, costing only 3.63 MB per subject. The method is also tested on novel view and pose synthesis, demonstrating its ability to render realistic details and handle challenging poses such as self-penetration. The approach is memory-efficient and fast, making it suitable for real-time applications. It is also capable of handling in-the-wild videos and provides high-quality geometry and rendering. The method is validated through extensive experiments, showing that it outperforms other state-of-the-art methods in terms of rendering quality, inference speed, and geometry accuracy. The results demonstrate that GoMAvatar is a promising approach for creating efficient, high-fidelity, animatable human models from monocular videos.
Reach us at info@study.space