21 Jun 2024 | Chubin Zhang, Hongliang Song, Yi Wei, Yu Chen, Jiwen Lu, Yansong Tang
The paper introduces the Geometry-Aware Large Reconstruction Model (GeoLRM), a novel approach for generating high-quality 3D assets using 512k Gaussians and 21 input images with only 11 GB of GPU memory. Unlike previous methods that neglect the sparsity of 3D structures and fail to utilize explicit geometric relationships between 3D and 2D images, GeoLRM incorporates a 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms to integrate image features into 3D representations. The model is implemented through a two-stage pipeline: a lightweight proposal network generates a sparse set of 3D anchor points from input images, and a specialized reconstruction transformer refines the geometry and retrieves textural details. Experimental results demonstrate that GeoLRM significantly outperforms existing models, especially for dense view inputs, and showcases its practical applicability in 3D generation tasks. The model's effectiveness is further validated through quantitative and qualitative evaluations on the Google Scanned Objects (GSO) dataset, highlighting its superior performance in handling complex 3D reconstructions.The paper introduces the Geometry-Aware Large Reconstruction Model (GeoLRM), a novel approach for generating high-quality 3D assets using 512k Gaussians and 21 input images with only 11 GB of GPU memory. Unlike previous methods that neglect the sparsity of 3D structures and fail to utilize explicit geometric relationships between 3D and 2D images, GeoLRM incorporates a 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms to integrate image features into 3D representations. The model is implemented through a two-stage pipeline: a lightweight proposal network generates a sparse set of 3D anchor points from input images, and a specialized reconstruction transformer refines the geometry and retrieves textural details. Experimental results demonstrate that GeoLRM significantly outperforms existing models, especially for dense view inputs, and showcases its practical applicability in 3D generation tasks. The model's effectiveness is further validated through quantitative and qualitative evaluations on the Google Scanned Objects (GSO) dataset, highlighting its superior performance in handling complex 3D reconstructions.