LN3DIFF: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

LN3DIFF: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

10 Aug 2024 | Yushi Lan¹, Fangzhou Hong¹, Shuai Yang², Shangchen Zhou¹, Xuyi Meng¹, Bo Dai³, Xingang Pan¹, and Chen Change Loy¹
LN3DIFF is a novel framework for scalable latent neural fields diffusion to enable fast and high-quality 3D generation. The method encodes input images into a structured, compact, and 3D latent space using a variational autoencoder (VAE), which is then decoded by a transformer-based decoder into a high-capacity 3D neural field. Training a diffusion model on this 3D-aware latent space allows for high-quality monocular 3D reconstruction and text-to-3D synthesis. The framework addresses challenges in scalability, efficiency, and generalizability in 3D diffusion, achieving superior performance on benchmark datasets like ShapeNet, FFHQ, and Objaverse. LN3DIFF outperforms existing 3D diffusion methods in inference speed and supports conditional 3D generation across diverse datasets. The method leverages a 3D-aware architecture and amortized training to enable efficient 3D diffusion learning. It also introduces a novel 3D-aware reconstruction model that achieves high-quality 3D data encoding in an amortized manner, demonstrating state-of-the-art performance on the ShapeNet benchmark. The method is compatible with advances in 3D representations and enables efficient 3D diffusion learning. The framework is evaluated on various datasets and shows superior performance in 3D reconstruction and conditional generation. The method is also compared with existing approaches and demonstrates significant improvements in terms of speed and quality.LN3DIFF is a novel framework for scalable latent neural fields diffusion to enable fast and high-quality 3D generation. The method encodes input images into a structured, compact, and 3D latent space using a variational autoencoder (VAE), which is then decoded by a transformer-based decoder into a high-capacity 3D neural field. Training a diffusion model on this 3D-aware latent space allows for high-quality monocular 3D reconstruction and text-to-3D synthesis. The framework addresses challenges in scalability, efficiency, and generalizability in 3D diffusion, achieving superior performance on benchmark datasets like ShapeNet, FFHQ, and Objaverse. LN3DIFF outperforms existing 3D diffusion methods in inference speed and supports conditional 3D generation across diverse datasets. The method leverages a 3D-aware architecture and amortized training to enable efficient 3D diffusion learning. It also introduces a novel 3D-aware reconstruction model that achieves high-quality 3D data encoding in an amortized manner, demonstrating state-of-the-art performance on the ShapeNet benchmark. The method is compatible with advances in 3D representations and enables efficient 3D diffusion learning. The framework is evaluated on various datasets and shows superior performance in 3D reconstruction and conditional generation. The method is also compared with existing approaches and demonstrates significant improvements in terms of speed and quality.
Reach us at info@study.space