20 Nov 2024 | Heng Yu, Chaoyang Wang, Peiye Zhuang, Willi Menapace, Aliaksandr Siarohin, Junli Cao, László A Jeni, Sergey Tulyakov, Hsin-Ying Lee
4Real is a novel framework designed to generate near-photorealistic dynamic 4D scenes from text prompts. The method leverages deformable 3D Gaussian Splats (D-3DGS) to model the scenes, allowing for viewing at any timestep from different camera poses. Unlike existing methods that rely on multi-view generative models, 4Real uses video generative models trained on diverse real-world datasets to enhance photorealism and structural integrity. The pipeline involves generating a reference video, reconstructing a canonical 3D representation from a selected frame, and learning temporal deformations to capture dynamic interactions. This approach provides more flexible use cases, diverse results, and reduced computational requirements compared to previous methods. The method is evaluated through user studies and quantitative metrics, demonstrating superior performance in various aspects of scene generation.4Real is a novel framework designed to generate near-photorealistic dynamic 4D scenes from text prompts. The method leverages deformable 3D Gaussian Splats (D-3DGS) to model the scenes, allowing for viewing at any timestep from different camera poses. Unlike existing methods that rely on multi-view generative models, 4Real uses video generative models trained on diverse real-world datasets to enhance photorealism and structural integrity. The pipeline involves generating a reference video, reconstructing a canonical 3D representation from a selected frame, and learning temporal deformations to capture dynamic interactions. This approach provides more flexible use cases, diverse results, and reduced computational requirements compared to previous methods. The method is evaluated through user studies and quantitative metrics, demonstrating superior performance in various aspects of scene generation.