RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion

RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion

Mar 2025 | Jaidev Shriram, Alex Trevithick, Lingjie Liu, Ravi Ramamoorthi
RealmDreamer is a text-driven 3D scene generation method that leverages inpainting and depth diffusion models to produce high-quality 3D scenes. The method uses a 3D Gaussian Splatting (3DGS) representation, optimized with pretrained diffusion models to match complex text prompts. A key innovation is the use of 2D inpainting diffusion models conditioned on an initial scene estimate to provide low-variance supervision for unknown regions during 3D distillation. This is combined with geometric distillation from a depth diffusion model, conditioned on samples from the inpainting model. The method achieves state-of-the-art results with parallax, detailed appearance, and realistic geometry, outperforming existing approaches in a comprehensive user study, with 95.5% preference over ProlificDreamer. RealmDreamer can synthesize various high-quality 3D scenes in different styles with complex layouts and even generate 3D scenes from a single image. The method is efficient, with a user study showing significant improvements over state-of-the-art techniques. It is also generalizable, allowing for 3D synthesis from a single image without requiring video or multi-view data. The method is evaluated on several quantitative metrics and shows superior results in text-based 3D scene generation. The technique is implemented in PyTorch3D and uses Stable Diffusion for outpainting. The method is compared with state-of-the-art techniques such as DreamFusion, ProlificDreamer, Text2Room, and LucidDreamer, showing significant improvements in rendering quality and scene consistency. The method is also applied to single-image to 3D generation, demonstrating its effectiveness in generating realistic 3D scenes from a single image and text prompt. The method is supported by a comprehensive set of experiments and ablation studies, showing the effectiveness of the proposed contributions. The method is efficient, with a user study showing significant improvements over state-of-the-art techniques. It is also generalizable, allowing for 3D synthesis from a single image without requiring video or multi-view data. The method is evaluated on several quantitative metrics and shows superior results in text-based 3D scene generation. The technique is implemented in PyTorch3D and uses Stable Diffusion for outpainting. The method is compared with state-of-the-art techniques such as DreamFusion, ProlificDreamer, Text2Room, and LucidDreamer, showing significant improvements in rendering quality and scene consistency. The method is also applied to single-image to 3D generation, demonstrating its effectiveness in generating realistic 3D scenes from a single image and text prompt. The method is supported by a comprehensive set of experiments and ablation studies, showing the effectiveness of the proposed contributions.RealmDreamer is a text-driven 3D scene generation method that leverages inpainting and depth diffusion models to produce high-quality 3D scenes. The method uses a 3D Gaussian Splatting (3DGS) representation, optimized with pretrained diffusion models to match complex text prompts. A key innovation is the use of 2D inpainting diffusion models conditioned on an initial scene estimate to provide low-variance supervision for unknown regions during 3D distillation. This is combined with geometric distillation from a depth diffusion model, conditioned on samples from the inpainting model. The method achieves state-of-the-art results with parallax, detailed appearance, and realistic geometry, outperforming existing approaches in a comprehensive user study, with 95.5% preference over ProlificDreamer. RealmDreamer can synthesize various high-quality 3D scenes in different styles with complex layouts and even generate 3D scenes from a single image. The method is efficient, with a user study showing significant improvements over state-of-the-art techniques. It is also generalizable, allowing for 3D synthesis from a single image without requiring video or multi-view data. The method is evaluated on several quantitative metrics and shows superior results in text-based 3D scene generation. The technique is implemented in PyTorch3D and uses Stable Diffusion for outpainting. The method is compared with state-of-the-art techniques such as DreamFusion, ProlificDreamer, Text2Room, and LucidDreamer, showing significant improvements in rendering quality and scene consistency. The method is also applied to single-image to 3D generation, demonstrating its effectiveness in generating realistic 3D scenes from a single image and text prompt. The method is supported by a comprehensive set of experiments and ablation studies, showing the effectiveness of the proposed contributions. The method is efficient, with a user study showing significant improvements over state-of-the-art techniques. It is also generalizable, allowing for 3D synthesis from a single image without requiring video or multi-view data. The method is evaluated on several quantitative metrics and shows superior results in text-based 3D scene generation. The technique is implemented in PyTorch3D and uses Stable Diffusion for outpainting. The method is compared with state-of-the-art techniques such as DreamFusion, ProlificDreamer, Text2Room, and LucidDreamer, showing significant improvements in rendering quality and scene consistency. The method is also applied to single-image to 3D generation, demonstrating its effectiveness in generating realistic 3D scenes from a single image and text prompt. The method is supported by a comprehensive set of experiments and ablation studies, showing the effectiveness of the proposed contributions.
Reach us at info@study.space
Understanding RealmDreamer%3A Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion