11 Jul 2024 | Yu Deng, Duomin Wang, and Baoyuan Wang
The paper introduces a novel learning approach for one-shot 4D head avatar synthesis using pseudo multi-view data. Unlike existing methods that rely on monocular videos and 3D Morphable Models (3DMMs) for reconstruction, the proposed method employs pseudo multi-view videos to learn a 4D head synthesizer in a data-driven manner. The key idea is to first learn a 3D head synthesizer using synthetic multi-view images to convert monocular real videos into multi-view ones. Then, the pseudo multi-view videos are used to learn a 4D head synthesizer via cross-view self-reenactment. The method leverages a vision transformer backbone with motion-aware cross-attentions, achieving superior performance in terms of reconstruction fidelity, geometry consistency, and motion control accuracy compared to previous methods. The authors hope that their method will inspire future work on integrating 3D priors with 2D supervisions for improved 4D head avatar creation.The paper introduces a novel learning approach for one-shot 4D head avatar synthesis using pseudo multi-view data. Unlike existing methods that rely on monocular videos and 3D Morphable Models (3DMMs) for reconstruction, the proposed method employs pseudo multi-view videos to learn a 4D head synthesizer in a data-driven manner. The key idea is to first learn a 3D head synthesizer using synthetic multi-view images to convert monocular real videos into multi-view ones. Then, the pseudo multi-view videos are used to learn a 4D head synthesizer via cross-view self-reenactment. The method leverages a vision transformer backbone with motion-aware cross-attentions, achieving superior performance in terms of reconstruction fidelity, geometry consistency, and motion control accuracy compared to previous methods. The authors hope that their method will inspire future work on integrating 3D priors with 2D supervisions for improved 4D head avatar creation.