CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model

CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model

8 Mar 2024 | Zhengyi Wang, Yikai Wang, Yifei Chen, Chendong Xiang, Shuo Chen, Dajiang Yu, Chongxuan Li, Hang Su, Jun Zhu
This paper presents a Convolutional Reconstruction Model (CRM) for generating high-fidelity 3D textured meshes from a single image. The CRM leverages the spatial alignment between input images and the triplane structure to produce high-resolution 3D content. The model uses a multi-view diffusion model to generate six orthographic images and canonical coordinate maps (CCMs) from the input image. These images and CCMs are then fed into a convolutional U-Net to generate a rolled-out triplane, which is reshaped into a triplane. The triplane is then decoded into SDF values, texture color, and Flexicubes parameters to generate a textured mesh. The model is trained end-to-end, directly outputting the textured mesh as the final result. The CRM achieves high-quality textured meshes in just 10 seconds, without any test-time optimization. The model outperforms existing methods in terms of both geometry and texture quality, and has a significantly lower training cost compared to transformer-based methods. The CRM also incorporates a Flexicubes geometry representation, which allows for direct gradient-based mesh optimization. The model is evaluated on the GSO dataset and shows superior performance in terms of geometry and texture quality compared to other baselines. The CRM is trained on a large-scale 3D dataset and achieves high-quality results with a small batch size. The model is also compared with other methods in terms of quantitative metrics such as Chamfer Distance, Volume IoU, and F-Score, and shows superior performance. The CRM is able to generate high-fidelity textured meshes from a single image in just 10 seconds, making it a fast and efficient method for 3D generation.This paper presents a Convolutional Reconstruction Model (CRM) for generating high-fidelity 3D textured meshes from a single image. The CRM leverages the spatial alignment between input images and the triplane structure to produce high-resolution 3D content. The model uses a multi-view diffusion model to generate six orthographic images and canonical coordinate maps (CCMs) from the input image. These images and CCMs are then fed into a convolutional U-Net to generate a rolled-out triplane, which is reshaped into a triplane. The triplane is then decoded into SDF values, texture color, and Flexicubes parameters to generate a textured mesh. The model is trained end-to-end, directly outputting the textured mesh as the final result. The CRM achieves high-quality textured meshes in just 10 seconds, without any test-time optimization. The model outperforms existing methods in terms of both geometry and texture quality, and has a significantly lower training cost compared to transformer-based methods. The CRM also incorporates a Flexicubes geometry representation, which allows for direct gradient-based mesh optimization. The model is evaluated on the GSO dataset and shows superior performance in terms of geometry and texture quality compared to other baselines. The CRM is trained on a large-scale 3D dataset and achieves high-quality results with a small batch size. The model is also compared with other methods in terms of quantitative metrics such as Chamfer Distance, Volume IoU, and F-Score, and shows superior performance. The CRM is able to generate high-fidelity textured meshes from a single image in just 10 seconds, making it a fast and efficient method for 3D generation.
Reach us at info@study.space
[slides] CRM%3A Single Image to 3D Textured Mesh with Convolutional Reconstruction Model | StudySpace