GVGEN: Text-to-3D Generation with Volumetric Representation

GVGEN: Text-to-3D Generation with Volumetric Representation

16 Jul 2024 | Xianglong He, Junyi Chen, Sida Peng, Di Huang, Yangguang Li, Xiaoshui Huang, Chun Yuan, Wanli Ouyang, Tong He
GVGEN is a novel diffusion-based framework for text-to-3D generation, designed to efficiently generate 3D Gaussian representations from text input. The framework introduces two key innovations: (1) Structured Volumetric Representation, which organizes disorganized 3D Gaussian points into a structured GaussianVolume, enabling the capture of intricate texture details within a fixed number of Gaussians. A unique pruning and densifying method, the Candidate Pool Strategy, is proposed to enhance detail fidelity through selective optimization. (2) Coarse-to-Fine Generation Pipeline, which first constructs a basic geometric structure and then predicts complete Gaussian attributes, allowing for detailed 3D geometry generation. GVGEN demonstrates superior performance in both qualitative and quantitative assessments compared to existing 3D generation methods, achieving fast generation speed (~7 seconds) while maintaining a balance between quality and efficiency. The framework is evaluated on the Objaverse-LVIS dataset, with results showing competitive capabilities in generating high-quality 3D assets. Ablation studies confirm the effectiveness of the proposed strategies, with GVGEN outperforming other methods in terms of rendering quality and generation diversity. The paper also discusses limitations, including the challenge of generating 3D assets from texts significantly divergent from the training domain and the trade-off between computational resources and rendering effects. Overall, GVGEN represents a significant advancement in text-to-3D generation, offering a fast and efficient method for generating high-quality 3D assets.GVGEN is a novel diffusion-based framework for text-to-3D generation, designed to efficiently generate 3D Gaussian representations from text input. The framework introduces two key innovations: (1) Structured Volumetric Representation, which organizes disorganized 3D Gaussian points into a structured GaussianVolume, enabling the capture of intricate texture details within a fixed number of Gaussians. A unique pruning and densifying method, the Candidate Pool Strategy, is proposed to enhance detail fidelity through selective optimization. (2) Coarse-to-Fine Generation Pipeline, which first constructs a basic geometric structure and then predicts complete Gaussian attributes, allowing for detailed 3D geometry generation. GVGEN demonstrates superior performance in both qualitative and quantitative assessments compared to existing 3D generation methods, achieving fast generation speed (~7 seconds) while maintaining a balance between quality and efficiency. The framework is evaluated on the Objaverse-LVIS dataset, with results showing competitive capabilities in generating high-quality 3D assets. Ablation studies confirm the effectiveness of the proposed strategies, with GVGEN outperforming other methods in terms of rendering quality and generation diversity. The paper also discusses limitations, including the challenge of generating 3D assets from texts significantly divergent from the training domain and the trade-off between computational resources and rendering effects. Overall, GVGEN represents a significant advancement in text-to-3D generation, offering a fast and efficient method for generating high-quality 3D assets.
Reach us at info@study.space
[slides] GVGEN%3A Text-to-3D Generation with Volumetric Representation | StudySpace