HumanSplat is a novel, generalizable method for 3D human reconstruction from a single image. It integrates a 2D multi-view diffusion model and a latent reconstruction Transformer, leveraging human structure priors to achieve high-fidelity texture modeling and efficient reconstruction. The method addresses the limitations of existing approaches by directly inferring Gaussian properties from a single input image, eliminating the need for per-instance optimization or densely captured images. Key contributions include:
1. **Generalizable Gaussian Splatting**: HumanSplat predicts 3D Gaussian properties from a single image, achieving state-of-the-art rendering quality.
2. **Integrated Priors**: It combines 2D appearance priors from a generative diffusion model and 3D geometric priors from the SMPL model within a unified framework.
3. **Semantic Cues**: It enhances reconstruction quality by incorporating semantic cues and hierarchical supervision, improving the fidelity of detailed areas like the face and hands.
4. **Efficiency**: The method achieves fast reconstruction times, making it practical for real-world applications.
Experiments on standard benchmarks and in-the-wild images demonstrate that HumanSplat outperforms existing methods in both quality and efficiency, providing robust performance even for challenging poses and loose clothing. The method opens up potential applications in various fields, including social media, gaming, and telepresence.HumanSplat is a novel, generalizable method for 3D human reconstruction from a single image. It integrates a 2D multi-view diffusion model and a latent reconstruction Transformer, leveraging human structure priors to achieve high-fidelity texture modeling and efficient reconstruction. The method addresses the limitations of existing approaches by directly inferring Gaussian properties from a single input image, eliminating the need for per-instance optimization or densely captured images. Key contributions include:
1. **Generalizable Gaussian Splatting**: HumanSplat predicts 3D Gaussian properties from a single image, achieving state-of-the-art rendering quality.
2. **Integrated Priors**: It combines 2D appearance priors from a generative diffusion model and 3D geometric priors from the SMPL model within a unified framework.
3. **Semantic Cues**: It enhances reconstruction quality by incorporating semantic cues and hierarchical supervision, improving the fidelity of detailed areas like the face and hands.
4. **Efficiency**: The method achieves fast reconstruction times, making it practical for real-world applications.
Experiments on standard benchmarks and in-the-wild images demonstrate that HumanSplat outperforms existing methods in both quality and efficiency, providing robust performance even for challenging poses and loose clothing. The method opens up potential applications in various fields, including social media, gaming, and telepresence.