HexaGen3D: StableDiffusion is just one step away from Fast and Diverse Text-to-3D Generation

HexaGen3D: StableDiffusion is just one step away from Fast and Diverse Text-to-3D Generation

15 Jan 2024 | Antoine Mercier, Ramin Nakhli*, Mahesh Reddy, Rajeev Yasarla, Hong Cai, Fatih Porikli, Guillaume Berger
HexaGen3D is a novel approach that leverages pre-trained text-to-image models to generate high-quality textured 3D meshes from textual prompts in just 7 seconds. The method, detailed in the paper by Antoine Mercier et al., from Qualcomm AI Research, addresses the challenge of generating 3D assets efficiently and with high quality, which remains a significant task despite advancements in generative modeling. HexaGen3D fine-tunes a pre-trained text-to-image model to predict six orthographic projections and corresponding latent tri-plane representations, which are then decoded into textured meshes. This approach does not require per-sample optimization and generalizes well to new objects or compositions not encountered during fine-tuning. Key contributions include the introduction of "Orthographic Hexaview guidance," a technique that aligns 2D prior knowledge with 3D spatial reasoning, and the use of pre-trained models like StableDiffusion for efficient 3D asset generation. HexaGen3D outperforms existing methods in terms of quality and speed, making it a promising tool for 3D content creation.HexaGen3D is a novel approach that leverages pre-trained text-to-image models to generate high-quality textured 3D meshes from textual prompts in just 7 seconds. The method, detailed in the paper by Antoine Mercier et al., from Qualcomm AI Research, addresses the challenge of generating 3D assets efficiently and with high quality, which remains a significant task despite advancements in generative modeling. HexaGen3D fine-tunes a pre-trained text-to-image model to predict six orthographic projections and corresponding latent tri-plane representations, which are then decoded into textured meshes. This approach does not require per-sample optimization and generalizes well to new objects or compositions not encountered during fine-tuning. Key contributions include the introduction of "Orthographic Hexaview guidance," a technique that aligns 2D prior knowledge with 3D spatial reasoning, and the use of pre-trained models like StableDiffusion for efficient 3D asset generation. HexaGen3D outperforms existing methods in terms of quality and speed, making it a promising tool for 3D content creation.
Reach us at info@study.space
[slides] HexaGen3D%3A StableDiffusion is just one step away from Fast and Diverse Text-to-3D Generation | StudySpace