HumanSplat is a novel generalizable method for single-image human reconstruction that predicts 3D Gaussian Splatting properties from a single input image. The method integrates a 2D multi-view diffusion model and a latent reconstruction Transformer with human structure priors, enabling high-fidelity texture modeling and accurate 3D reconstruction. A hierarchical loss incorporating human semantic information is designed to enhance the quality of the reconstruction. Comprehensive experiments on standard benchmarks and in-the-wild images show that HumanSplat outperforms existing state-of-the-art methods in achieving photorealistic novel-view synthesis. The method eliminates the need for per-instance optimization or densely captured images, enabling effective generalization in diverse scenarios. The key insight is to reconstruct Gaussian properties from the diffusion latent space in an end-to-end architecture, integrating a 2D generative diffusion model as appearance prior and a human parametric model as structure prior. The method addresses challenges such as capturing fine details in visually sensitive areas and balancing robustness and flexibility. HumanSplat achieves state-of-the-art performance by synergistically combining these components into a unified framework. The method is trained on high-fidelity human scans and demonstrates fast inference times, with a reconstruction time of approximately 9.3 seconds on a single NVIDIA A100 GPU. It outperforms other methods in terms of PSNR and LPIPS metrics, and achieves high-quality results on in-the-wild images. The method is efficient and can be applied to various scenarios, including real-time applications. Limitations include handling intricate garments and accessories, increasing computational speed, and animating reconstructed human models. Future work aims to address these limitations and further improve the method's capabilities.HumanSplat is a novel generalizable method for single-image human reconstruction that predicts 3D Gaussian Splatting properties from a single input image. The method integrates a 2D multi-view diffusion model and a latent reconstruction Transformer with human structure priors, enabling high-fidelity texture modeling and accurate 3D reconstruction. A hierarchical loss incorporating human semantic information is designed to enhance the quality of the reconstruction. Comprehensive experiments on standard benchmarks and in-the-wild images show that HumanSplat outperforms existing state-of-the-art methods in achieving photorealistic novel-view synthesis. The method eliminates the need for per-instance optimization or densely captured images, enabling effective generalization in diverse scenarios. The key insight is to reconstruct Gaussian properties from the diffusion latent space in an end-to-end architecture, integrating a 2D generative diffusion model as appearance prior and a human parametric model as structure prior. The method addresses challenges such as capturing fine details in visually sensitive areas and balancing robustness and flexibility. HumanSplat achieves state-of-the-art performance by synergistically combining these components into a unified framework. The method is trained on high-fidelity human scans and demonstrates fast inference times, with a reconstruction time of approximately 9.3 seconds on a single NVIDIA A100 GPU. It outperforms other methods in terms of PSNR and LPIPS metrics, and achieves high-quality results on in-the-wild images. The method is efficient and can be applied to various scenarios, including real-time applications. Limitations include handling intricate garments and accessories, increasing computational speed, and animating reconstructed human models. Future work aims to address these limitations and further improve the method's capabilities.