[slides] Meta 3D AssetGen%3A Text-to-Mesh Generation with High-Quality Geometry%2C Texture%2C and PBR Materials

Meta 3D AssetGen is a novel text-to-3D generation system that produces high-quality 3D meshes with detailed geometry, high-fidelity textures, and physically-based rendering (PBR) materials. It uses a two-stage approach: first, a text-to-image stage generates 4-view images with shaded and albedo channels, and second, an image-to-3D stage reconstructs the 3D shape and PBR materials from these views. The system uses a sign-distance function (SDF) for efficient shape representation and a texture refinement transformer to enhance texture quality. AssetGen outperforms existing methods in terms of mesh accuracy, texture quality, and material control, achieving a 17% improvement in Chamfer Distance and 40% improvement in LPIPS over the best concurrent work for few-view reconstruction. It also achieves a 72% human preference over the best industry competitors in terms of visual quality and text alignment. The system supports PBR materials, enabling realistic relighting and accurate material decomposition. AssetGen is applicable to both text-to-3D and image-to-3D tasks, with the latter using a novel PBR-based sparse-view reconstruction model and a texture refiner. The system is trained on a large dataset of 3D meshes and uses a combination of losses, including PBR and albedo rendering losses, to ensure high-quality results. AssetGen demonstrates superior performance in both sparse-view reconstruction and text-to-3D generation, with high-quality materials and detailed geometry. The system is implemented using efficient kernels and is capable of generating 3D assets in under 30 seconds.Meta 3D AssetGen is a novel text-to-3D generation system that produces high-quality 3D meshes with detailed geometry, high-fidelity textures, and physically-based rendering (PBR) materials. It uses a two-stage approach: first, a text-to-image stage generates 4-view images with shaded and albedo channels, and second, an image-to-3D stage reconstructs the 3D shape and PBR materials from these views. The system uses a sign-distance function (SDF) for efficient shape representation and a texture refinement transformer to enhance texture quality. AssetGen outperforms existing methods in terms of mesh accuracy, texture quality, and material control, achieving a 17% improvement in Chamfer Distance and 40% improvement in LPIPS over the best concurrent work for few-view reconstruction. It also achieves a 72% human preference over the best industry competitors in terms of visual quality and text alignment. The system supports PBR materials, enabling realistic relighting and accurate material decomposition. AssetGen is applicable to both text-to-3D and image-to-3D tasks, with the latter using a novel PBR-based sparse-view reconstruction model and a texture refiner. The system is trained on a large dataset of 3D meshes and uses a combination of losses, including PBR and albedo rendering losses, to ensure high-quality results. AssetGen demonstrates superior performance in both sparse-view reconstruction and text-to-3D generation, with high-quality materials and detailed geometry. The system is implemented using efficient kernels and is capable of generating 3D assets in under 30 seconds.

Meta 3D AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials

2 Jul 2024 | Yawar Siddiqui, Tom Monnier, Filippos Kokkinos, Mahendra Kariya, Yanir Kleiman, Emilien Garreau, Oran Gafni, Natalia Neverova, Andrea Vedaldi, Roman Shapovalov, David Novotny