BrightDreamer: Generic 3D Gaussian Generative Framework for Fast Text-to-3D Synthesis

BrightDreamer: Generic 3D Gaussian Generative Framework for Fast Text-to-3D Synthesis

18 Nov 2024 | Lutao Jiang, Xu Zheng, Yuanhuiyi Lyu, Jiazhou Zhou, Lin Wang
BrightDreamer is a novel end-to-end feed-forward framework for fast text-to-3D synthesis, achieving generation speeds of 77 ms. The framework addresses the inefficiency of existing per-prompt optimization methods by directly generating 3D Gaussians from text prompts without iterative optimization. It formulates the generation process as estimating 3D deformation from an anchor shape with predefined positions. A Text-guided Shape Deformation (TSD) network predicts the deformed shape and new positions, used as the centers of 3D Gaussians. A Text-guided Triplane Generator (TTG) then generates a triplane representation for the 3D object. The centers of each Gaussian are used to transform spatial features into the four attributes (scaling, rotation, opacity, and SH coefficients). The generated 3D Gaussians can be rendered at 705 frames per second. BrightDreamer demonstrates strong semantic understanding and generalization capabilities, even for complex text prompts. It outperforms existing methods in text-to-3D generation, with extensive experiments showing its superiority. The framework is trained on a diverse set of text prompts and can generate 3D content from any unseen text prompt. It leverages 2D image diffusion models to improve training efficiency and generalization. BrightDreamer's design allows for flexible interpolation between different prompts, enabling creative exploration. The framework is efficient, scalable, and capable of generating high-quality 3D content quickly. It is the first method to achieve fast and generalizable text-to-3D synthesis using 3D Gaussian Splatting. The framework is evaluated against various methods, including per-prompt optimization, amortized text-to-NeRF, and text-to-image-to-3DGS approaches. It shows significant improvements in generation speed, quality, and generalization. The framework is also tested for word-level generalizability, demonstrating its ability to handle unseen words. Overall, BrightDreamer provides a robust and efficient solution for text-to-3D synthesis, with strong performance in both quality and speed.BrightDreamer is a novel end-to-end feed-forward framework for fast text-to-3D synthesis, achieving generation speeds of 77 ms. The framework addresses the inefficiency of existing per-prompt optimization methods by directly generating 3D Gaussians from text prompts without iterative optimization. It formulates the generation process as estimating 3D deformation from an anchor shape with predefined positions. A Text-guided Shape Deformation (TSD) network predicts the deformed shape and new positions, used as the centers of 3D Gaussians. A Text-guided Triplane Generator (TTG) then generates a triplane representation for the 3D object. The centers of each Gaussian are used to transform spatial features into the four attributes (scaling, rotation, opacity, and SH coefficients). The generated 3D Gaussians can be rendered at 705 frames per second. BrightDreamer demonstrates strong semantic understanding and generalization capabilities, even for complex text prompts. It outperforms existing methods in text-to-3D generation, with extensive experiments showing its superiority. The framework is trained on a diverse set of text prompts and can generate 3D content from any unseen text prompt. It leverages 2D image diffusion models to improve training efficiency and generalization. BrightDreamer's design allows for flexible interpolation between different prompts, enabling creative exploration. The framework is efficient, scalable, and capable of generating high-quality 3D content quickly. It is the first method to achieve fast and generalizable text-to-3D synthesis using 3D Gaussian Splatting. The framework is evaluated against various methods, including per-prompt optimization, amortized text-to-NeRF, and text-to-image-to-3DGS approaches. It shows significant improvements in generation speed, quality, and generalization. The framework is also tested for word-level generalizability, demonstrating its ability to handle unseen words. Overall, BrightDreamer provides a robust and efficient solution for text-to-3D synthesis, with strong performance in both quality and speed.
Reach us at info@futurestudyspace.com
[slides] BrightDreamer%3A Generic 3D Gaussian Generative Framework for Fast Text-to-3D Synthesis | StudySpace