HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors

HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors

30 Oct 2024 | Panwang Pan*, Zhuo Su†‡, Chenguo Lin*†‡, Zhen Fan†, Yongjie Zhang†, Zeming Li†, Tingting Shen‡, Yadong Mu‡, Yebin Liu‡
HumanSplat is a novel, generalizable method for 3D human reconstruction from a single image. It integrates a 2D multi-view diffusion model and a latent reconstruction Transformer, leveraging human structure priors to achieve high-fidelity texture modeling and efficient reconstruction. The method addresses the limitations of existing approaches by directly inferring Gaussian properties from a single input image, eliminating the need for per-instance optimization or densely captured images. Key contributions include: 1. **Generalizable Gaussian Splatting**: HumanSplat predicts 3D Gaussian properties from a single image, achieving state-of-the-art rendering quality. 2. **Integrated Priors**: It combines 2D appearance priors from a generative diffusion model and 3D geometric priors from the SMPL model within a unified framework. 3. **Semantic Cues**: It enhances reconstruction quality by incorporating semantic cues and hierarchical supervision, improving the fidelity of detailed areas like the face and hands. 4. **Efficiency**: The method achieves fast reconstruction times, making it practical for real-world applications. Experiments on standard benchmarks and in-the-wild images demonstrate that HumanSplat outperforms existing methods in both quality and efficiency, providing robust performance even for challenging poses and loose clothing. The method opens up potential applications in various fields, including social media, gaming, and telepresence.HumanSplat is a novel, generalizable method for 3D human reconstruction from a single image. It integrates a 2D multi-view diffusion model and a latent reconstruction Transformer, leveraging human structure priors to achieve high-fidelity texture modeling and efficient reconstruction. The method addresses the limitations of existing approaches by directly inferring Gaussian properties from a single input image, eliminating the need for per-instance optimization or densely captured images. Key contributions include: 1. **Generalizable Gaussian Splatting**: HumanSplat predicts 3D Gaussian properties from a single image, achieving state-of-the-art rendering quality. 2. **Integrated Priors**: It combines 2D appearance priors from a generative diffusion model and 3D geometric priors from the SMPL model within a unified framework. 3. **Semantic Cues**: It enhances reconstruction quality by incorporating semantic cues and hierarchical supervision, improving the fidelity of detailed areas like the face and hands. 4. **Efficiency**: The method achieves fast reconstruction times, making it practical for real-world applications. Experiments on standard benchmarks and in-the-wild images demonstrate that HumanSplat outperforms existing methods in both quality and efficiency, providing robust performance even for challenging poses and loose clothing. The method opens up potential applications in various fields, including social media, gaming, and telepresence.
Reach us at info@study.space
[slides and audio] HumanSplat%3A Generalizable Single-Image Human Gaussian Splatting with Structure Priors