[slides] TripoSR%3A Fast 3D Object Reconstruction from a Single Image

**TripoSR: Fast 3D Object Reconstruction from a Single Image** This technical report introduces TripoSR, a 3D reconstruction model that leverages transformer architecture to generate high-quality 3D meshes from a single image in under 0.5 seconds. Building on the LRM network architecture, TripoSR incorporates significant improvements in data processing, model design, and training techniques. Evaluations on public datasets, such as GSO and OmniObject3D, demonstrate superior performance in both quantitative and qualitative metrics compared to other open-source alternatives. **Model Overview:** - **Core Components:** TripoSR includes an image encoder, an image-to-triplane decoder, and a triplane-based neural radiance field (NeRF). - **Image Encoder:** Initialized with a pre-trained vision transformer model (DINOv1), it projects the RGB image into latent vectors. - **Image-to-Triplane Decoder:** Transforms these vectors into a triplane-NeRF representation, which is a compact and expressive 3D representation. - **NeRF Model:** Consists of multilayer perceptrons (MLPs) responsible for predicting the color and density of 3D points. **Data Improvements:** - **Data Curation:** Selects a curated subset of the Objaverse dataset to enhance training data quality. - **Data Rendering:** Emulates real-world image distributions to improve generalization. **Model and Training Improvements:** - **Triplane Channel Optimization:** Adjusts channel counts to balance reconstruction quality and computational efficiency. - **Mask Loss:** Reduces "floater" artifacts and improves reconstruction fidelity. - **Local Rendering Supervision:** Uses random patches from high-resolution images to balance computational efficiency and reconstruction detail. **Results:** - **Quantitative Comparisons:** TripoSR outperforms state-of-the-art methods in Chamfer Distance (CD) and F-score (FS) metrics. - **Performance vs. Runtime:** TripoSR achieves fast inference times while maintaining high reconstruction quality. - **Qualitative Results:** TripoSR reconstructs detailed and textured 3D shapes with higher quality and better details compared to other methods. **Conclusion:** TripoSR is an open-source feedforward 3D reconstruction model that combines transformer architecture with substantial technical improvements. It demonstrates state-of-the-art performance and computational efficiency, aiming to empower researchers and developers in advancing 3D generative AI.**TripoSR: Fast 3D Object Reconstruction from a Single Image** This technical report introduces TripoSR, a 3D reconstruction model that leverages transformer architecture to generate high-quality 3D meshes from a single image in under 0.5 seconds. Building on the LRM network architecture, TripoSR incorporates significant improvements in data processing, model design, and training techniques. Evaluations on public datasets, such as GSO and OmniObject3D, demonstrate superior performance in both quantitative and qualitative metrics compared to other open-source alternatives. **Model Overview:** - **Core Components:** TripoSR includes an image encoder, an image-to-triplane decoder, and a triplane-based neural radiance field (NeRF). - **Image Encoder:** Initialized with a pre-trained vision transformer model (DINOv1), it projects the RGB image into latent vectors. - **Image-to-Triplane Decoder:** Transforms these vectors into a triplane-NeRF representation, which is a compact and expressive 3D representation. - **NeRF Model:** Consists of multilayer perceptrons (MLPs) responsible for predicting the color and density of 3D points. **Data Improvements:** - **Data Curation:** Selects a curated subset of the Objaverse dataset to enhance training data quality. - **Data Rendering:** Emulates real-world image distributions to improve generalization. **Model and Training Improvements:** - **Triplane Channel Optimization:** Adjusts channel counts to balance reconstruction quality and computational efficiency. - **Mask Loss:** Reduces "floater" artifacts and improves reconstruction fidelity. - **Local Rendering Supervision:** Uses random patches from high-resolution images to balance computational efficiency and reconstruction detail. **Results:** - **Quantitative Comparisons:** TripoSR outperforms state-of-the-art methods in Chamfer Distance (CD) and F-score (FS) metrics. - **Performance vs. Runtime:** TripoSR achieves fast inference times while maintaining high reconstruction quality. - **Qualitative Results:** TripoSR reconstructs detailed and textured 3D shapes with higher quality and better details compared to other methods. **Conclusion:** TripoSR is an open-source feedforward 3D reconstruction model that combines transformer architecture with substantial technical improvements. It demonstrates state-of-the-art performance and computational efficiency, aiming to empower researchers and developers in advancing 3D generative AI.

TripoSR: Fast 3D Object Reconstruction from a Single Image

4 Mar 2024 | Dmitry Tochilkin, David Pankratz, Zexiang Liu, Zixuan Huang, Adam Letts, Yangguang Li, Ding Liang, Christian Laforte, Varun Jampani, Yan-Pei Cao