20 Jan 2024 | Ka-Hei Hui, Aditya Sanghi, Arianna Rampini, Zhengzhe Liu, Hooman Shayani, Kamal Rahimi Malekshan, Chi-Wing Fu
Make-A-Shape is a large-scale 3D generative model trained on over 10 million diverse 3D shapes. It can generate a wide variety of 3D shapes across various object categories, featuring intricate geometric details, plausible structures, nontrivial topologies, and clean surfaces. The model is designed for efficient training and can generate shapes from various input modalities, including single/multi-view images, point clouds, and low-resolution voxels. The model's wavelet-tree representation compactly encodes shapes by formulating subband coefficient filtering and packing schemes, enabling efficient generation by a diffusion model. The representation is also adaptive, allowing the model to train effectively on both coarse and detail wavelet coefficients. The model's framework can be extended to control generation based on additional input conditions. The model achieves high-quality results in a few seconds, surpassing the state of the art in terms of efficiency and quality. It is capable of unconditional generation, shape completion, and conditional generation across various modalities. The model's wavelet-tree representation is compact, expressive, and efficient, enabling fast inference and training on millions of 3D shapes. The model's performance is evaluated on various datasets, showing superior results in terms of intersection over union (IoU) and light field distance (LFD) metrics compared to existing methods. The model's ability to generate high-quality 3D shapes from various input conditions demonstrates its effectiveness and versatility.Make-A-Shape is a large-scale 3D generative model trained on over 10 million diverse 3D shapes. It can generate a wide variety of 3D shapes across various object categories, featuring intricate geometric details, plausible structures, nontrivial topologies, and clean surfaces. The model is designed for efficient training and can generate shapes from various input modalities, including single/multi-view images, point clouds, and low-resolution voxels. The model's wavelet-tree representation compactly encodes shapes by formulating subband coefficient filtering and packing schemes, enabling efficient generation by a diffusion model. The representation is also adaptive, allowing the model to train effectively on both coarse and detail wavelet coefficients. The model's framework can be extended to control generation based on additional input conditions. The model achieves high-quality results in a few seconds, surpassing the state of the art in terms of efficiency and quality. It is capable of unconditional generation, shape completion, and conditional generation across various modalities. The model's wavelet-tree representation is compact, expressive, and efficient, enabling fast inference and training on millions of 3D shapes. The model's performance is evaluated on various datasets, showing superior results in terms of intersection over union (IoU) and light field distance (LFD) metrics compared to existing methods. The model's ability to generate high-quality 3D shapes from various input conditions demonstrates its effectiveness and versatility.