Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer

Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer

11 Jul 2024 | Yu Deng, Duomin Wang, and Baoyuan Wang
Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer This paper proposes a novel learning approach for feedforward one-shot 4D head avatar synthesis. Unlike existing methods that rely on 3DMM for reconstruction, this method uses pseudo multi-view videos to learn a 4D head synthesizer in a data-driven manner, avoiding the limitations of 3DMM. The key idea is to first learn a 3D head synthesizer using synthetic multi-view images to convert monocular videos into multi-view ones, and then use the pseudo multi-view videos to learn a 4D head synthesizer via cross-view self-reenactment. The method uses a simple vision transformer backbone with motion-aware cross-attentions, achieving superior performance in terms of reconstruction fidelity, geometry consistency, and motion control accuracy. The paper also discusses the advantages of using pseudo multi-view data, including scalability, feasibility of generating multi-view videos from monocular inputs, and enabling geometry learning beyond 3DMM. The method is evaluated on various benchmarks and shows significant improvements over previous approaches in terms of reconstruction fidelity, identity similarity, and expression and pose control accuracy. The paper concludes that the proposed method offers novel insights into integrating 3D priors with 2D data for improved 4D head avatar creation.Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer This paper proposes a novel learning approach for feedforward one-shot 4D head avatar synthesis. Unlike existing methods that rely on 3DMM for reconstruction, this method uses pseudo multi-view videos to learn a 4D head synthesizer in a data-driven manner, avoiding the limitations of 3DMM. The key idea is to first learn a 3D head synthesizer using synthetic multi-view images to convert monocular videos into multi-view ones, and then use the pseudo multi-view videos to learn a 4D head synthesizer via cross-view self-reenactment. The method uses a simple vision transformer backbone with motion-aware cross-attentions, achieving superior performance in terms of reconstruction fidelity, geometry consistency, and motion control accuracy. The paper also discusses the advantages of using pseudo multi-view data, including scalability, feasibility of generating multi-view videos from monocular inputs, and enabling geometry learning beyond 3DMM. The method is evaluated on various benchmarks and shows significant improvements over previous approaches in terms of reconstruction fidelity, identity similarity, and expression and pose control accuracy. The paper concludes that the proposed method offers novel insights into integrating 3D priors with 2D data for improved 4D head avatar creation.
Reach us at info@study.space
[slides] Portrait4D-v2%3A Pseudo Multi-View Data Creates Better 4D Head Synthesizer | StudySpace