GPAvatar: Generalizable and Precise Head Avatar from Image(s)

GPAvatar: Generalizable and Precise Head Avatar from Image(s)

18 Jan 2024 | Xuangeng Chu, Yu Li, Ailing Zeng, Tianyu Yang, Lijian Lin, Yunfei Liu, Tatsuya Harada
GPAVatar is a novel framework for reconstructing 3D head avatars from one or several images in a single forward pass, achieving strong generalization and precise expression control. The key contributions include: 1. **Dynamic Point-based Expression Field (PEF)**: A point cloud-driven field that precisely captures expressions, enabling natural and precise control over expressions. 2. **Multi Tri-planes Attention (MTA) Module**: Integrates information from multiple input images, enhancing synthesis quality and expression control. 3. **Canonical Feature Encoder**: Utilizes a tri-planes representation to construct the canonical feature space, leveraging strong 3D geometric priors. The method addresses challenges in multi-view consistency, non-facial information, and generalizing to new identities. Experimental results on the VFHQ and HDTF datasets demonstrate superior performance in synthesis quality, expression control, and cross-identity reenactment. The framework is designed with ethical considerations, including watermarks and restrictions on real-person synthesis to prevent misuse.GPAVatar is a novel framework for reconstructing 3D head avatars from one or several images in a single forward pass, achieving strong generalization and precise expression control. The key contributions include: 1. **Dynamic Point-based Expression Field (PEF)**: A point cloud-driven field that precisely captures expressions, enabling natural and precise control over expressions. 2. **Multi Tri-planes Attention (MTA) Module**: Integrates information from multiple input images, enhancing synthesis quality and expression control. 3. **Canonical Feature Encoder**: Utilizes a tri-planes representation to construct the canonical feature space, leveraging strong 3D geometric priors. The method addresses challenges in multi-view consistency, non-facial information, and generalizing to new identities. Experimental results on the VFHQ and HDTF datasets demonstrate superior performance in synthesis quality, expression control, and cross-identity reenactment. The framework is designed with ethical considerations, including watermarks and restrictions on real-person synthesis to prevent misuse.
Reach us at info@study.space