MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

14 Jun 2024 | Yiwen Chen, Tong He, Di Huang, Weicai Ye, Sijin Chen, Jiaxiang Tang, Xin Chen, Zhongang Cai, Lei Yang, Gang Yu, Guosheng Lin, Chi Zhang
MeshAnything is a novel model designed to convert any 3D representation into Artist-Created Meshes (AMs), which are meshes created by human artists. The model addresses the limitations of current mesh extraction methods, which often produce dense and inefficient meshes that are inferior to those created by artists. By treating mesh extraction as a generation problem, MeshAnything generates AMs that align with specified shapes, significantly improving storage, rendering, and simulation efficiencies while maintaining high precision. The architecture of MeshAnything consists of a VQ-VAE and a shape-conditioned decoder-only transformer. The VQ-VAE learns a mesh vocabulary, and the decoder-only transformer is trained on this vocabulary for shape-conditioned autoregressive mesh generation. The model uses point clouds as the shape condition, which are derived from 3D assets and injected into the transformer for shape-conditioned mesh generation. A noise-resistant decoder is also developed to enhance the quality of the generated meshes, making the model more robust to imperfect token sequences. Experiments demonstrate that MeshAnything generates AMs with significantly fewer faces and more refined topology compared to existing methods, achieving precision metrics that are close to or comparable with previous methods. The model can be integrated with various 3D asset production pipelines, enhancing their application in the 3D industry. However, limitations include the inability to generate meshes for scenes larger than a certain size and the generative nature of the model leading to less stability compared to methods like Marching Cubes.MeshAnything is a novel model designed to convert any 3D representation into Artist-Created Meshes (AMs), which are meshes created by human artists. The model addresses the limitations of current mesh extraction methods, which often produce dense and inefficient meshes that are inferior to those created by artists. By treating mesh extraction as a generation problem, MeshAnything generates AMs that align with specified shapes, significantly improving storage, rendering, and simulation efficiencies while maintaining high precision. The architecture of MeshAnything consists of a VQ-VAE and a shape-conditioned decoder-only transformer. The VQ-VAE learns a mesh vocabulary, and the decoder-only transformer is trained on this vocabulary for shape-conditioned autoregressive mesh generation. The model uses point clouds as the shape condition, which are derived from 3D assets and injected into the transformer for shape-conditioned mesh generation. A noise-resistant decoder is also developed to enhance the quality of the generated meshes, making the model more robust to imperfect token sequences. Experiments demonstrate that MeshAnything generates AMs with significantly fewer faces and more refined topology compared to existing methods, achieving precision metrics that are close to or comparable with previous methods. The model can be integrated with various 3D asset production pipelines, enhancing their application in the 3D industry. However, limitations include the inability to generate meshes for scenes larger than a certain size and the generative nature of the model leading to less stability compared to methods like Marching Cubes.
Reach us at info@study.space