4 Mar 2024 | Dmitry Tochilkin, David Pankratz, Zexiang Liu, Zixuan Huang, Adam Letts, Yangguang Li, Ding Liang, Christian Laforte, Varun Jampani, Yan-Pei Cao
TripoSR is a fast 3D reconstruction model that generates high-quality 3D meshes from a single image in under 0.5 seconds. Based on the LRM architecture, TripoSR improves data processing, model design, and training techniques. It uses a transformer-based approach, with an image encoder, image-to-triplane decoder, and a triplane-based neural radiance field (NeRF). The model is trained on the Objaverse dataset and incorporates data curation and rendering improvements. It also includes a mask loss function to reduce artifacts and improve reconstruction fidelity. TripoSR achieves state-of-the-art performance on public datasets, outperforming other open-source alternatives in both quantitative and qualitative metrics. The model is released under the MIT license, providing source code, a pretrained model, and an interactive demo. TripoSR enables researchers, developers, and creatives to advance 3D generative AI, promoting progress in AI, computer vision, and computer graphics. The model's efficiency and accuracy make it a valuable tool for 3D reconstruction tasks. TripoSR's results show that it produces high-quality 3D shapes and textures, outperforming other methods in terms of detail and accuracy. The model is designed to be adaptable and resilient, capable of handling a wide range of real-world scenarios without requiring precise camera information. TripoSR's technical advancements include triplane channel optimization, mask loss, and local rendering supervision, which contribute to its high performance and efficiency. The model's results demonstrate its effectiveness in 3D reconstruction, making it a significant contribution to the field of 3D generative AI.TripoSR is a fast 3D reconstruction model that generates high-quality 3D meshes from a single image in under 0.5 seconds. Based on the LRM architecture, TripoSR improves data processing, model design, and training techniques. It uses a transformer-based approach, with an image encoder, image-to-triplane decoder, and a triplane-based neural radiance field (NeRF). The model is trained on the Objaverse dataset and incorporates data curation and rendering improvements. It also includes a mask loss function to reduce artifacts and improve reconstruction fidelity. TripoSR achieves state-of-the-art performance on public datasets, outperforming other open-source alternatives in both quantitative and qualitative metrics. The model is released under the MIT license, providing source code, a pretrained model, and an interactive demo. TripoSR enables researchers, developers, and creatives to advance 3D generative AI, promoting progress in AI, computer vision, and computer graphics. The model's efficiency and accuracy make it a valuable tool for 3D reconstruction tasks. TripoSR's results show that it produces high-quality 3D shapes and textures, outperforming other methods in terms of detail and accuracy. The model is designed to be adaptable and resilient, capable of handling a wide range of real-world scenarios without requiring precise camera information. TripoSR's technical advancements include triplane channel optimization, mask loss, and local rendering supervision, which contribute to its high performance and efficiency. The model's results demonstrate its effectiveness in 3D reconstruction, making it a significant contribution to the field of 3D generative AI.