10 Aug 2024 | Yushi Lan1, Fangzhou Hong1, Shuai Yang2, Shangchen Zhou1, Xuyi Meng1, Bo Dai3, Xingang Pan1, and Chen Change Loy1
LN3DIFF is a novel framework for efficient and high-quality 3D generation, addressing the gap in unified 3D diffusion pipelines. It leverages a 3D-aware architecture and a variational autoencoder (VAE) to encode input images into a structured, compact, and 3D latent space. The latent space is decoded by a transformer-based decoder into a high-capacity 3D neural field. Training a diffusion model on this 3D-aware latent space enables superior performance on datasets like Objaverse, ShapeNet, and FFHQ for conditional 3D generation. LN3DIFF outperforms existing 3D diffusion methods in terms of inference speed, requiring no per-instance optimization. The method is designed to be view-efficient, needing only two views per instance during training, and demonstrates state-of-the-art performance in 3D generation and monocular 3D reconstruction. It also supports conditional 3D generation over diverse datasets, making it a versatile tool for 3D vision and graphics tasks.LN3DIFF is a novel framework for efficient and high-quality 3D generation, addressing the gap in unified 3D diffusion pipelines. It leverages a 3D-aware architecture and a variational autoencoder (VAE) to encode input images into a structured, compact, and 3D latent space. The latent space is decoded by a transformer-based decoder into a high-capacity 3D neural field. Training a diffusion model on this 3D-aware latent space enables superior performance on datasets like Objaverse, ShapeNet, and FFHQ for conditional 3D generation. LN3DIFF outperforms existing 3D diffusion methods in terms of inference speed, requiring no per-instance optimization. The method is designed to be view-efficient, needing only two views per instance during training, and demonstrates state-of-the-art performance in 3D generation and monocular 3D reconstruction. It also supports conditional 3D generation over diverse datasets, making it a versatile tool for 3D vision and graphics tasks.