CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets

CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets

30 May 2024 | LONGWEN ZHANG, ZIYU WANG, QIXUAN ZHANG, QIWEI QIU, ANQI PANG, HAORAN JIANG, WEI YANG, LAN XU, JINGYI YU
**CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets** CLAY is a novel 3D geometry and material generator designed to transform human imagination into intricate 3D digital structures. It supports various inputs, including text, images, and 3D-aware controls from diverse primitives. The core of CLAY is a large-scale generative model composed of a multi-resolution Variational Autoencoder (VAE) and a minimalistic latent Diffusion Transformer (DiT), which extracts rich 3D priors from a diverse range of 3D geometries. CLAY uses neural fields to represent continuous and complete surfaces and a geometry generative module with pure transformer blocks in latent space. It employs a progressive training scheme on an ultra-large 3D model dataset, resulting in a 3D native geometry generator with 1.5 billion parameters. For appearance generation, CLAY generates physically-based rendering (PBR) textures using a multi-view material diffusion model, capable of producing 2K resolution textures with diffuse, roughness, and metallic modalities. CLAY supports a wide range of controllable adaptations and creations, from sketchy conceptual designs to production-ready assets with intricate details. Even first-time users can easily use CLAY to bring their vivid 3D imaginations to life, unleashing unlimited creativity. **Key Features:** - **Large-scale Generative Model:** 1.5 billion parameters, trained on high-quality 3D data. - **Multi-resolution VAE and DiT:** Efficient geometric data encoding and decoding. - **Progressive Training Scheme:** Enhances model performance and adaptability. - **Data Standardization:** Remeshing and annotation techniques for unified 3D datasets. - **Material Synthesis:** Physically-based rendering (PBR) textures with rich modalities. - **Controllable Adaptations:** Supports text, image, voxel, multi-view images, point clouds, bounding boxes, and partial point clouds. **Applications:** - **3D Asset Creation:** From sketchy conceptual designs to production-ready assets. - **Geometric Optimization:** Ensures structural integrity and aesthetic refinement. - **Material Synthesis:** Adds lifelike qualities through realistic textures and materials. - **Model Adaptation:** Versatile foundation model for efficient fine-tuning and conditional generation. **Results:** - **Diverse 3D Models:** Generation of a wide range of objects with intricate details and textures. - **Image Conditioning:** Faithful generation of geometric entities from input images. - **Multi-view Images:** Reliable reconstruction of 3D geometries from multiple perspectives. - **Point Clouds:** Effective surface reconstruction from sparse point clouds. - **Geometric Optimization:** Improvement of 3D geometries generated by existing techniques. - **Rich Variability:** Generation of diverse shapes from the same voxel input.**CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets** CLAY is a novel 3D geometry and material generator designed to transform human imagination into intricate 3D digital structures. It supports various inputs, including text, images, and 3D-aware controls from diverse primitives. The core of CLAY is a large-scale generative model composed of a multi-resolution Variational Autoencoder (VAE) and a minimalistic latent Diffusion Transformer (DiT), which extracts rich 3D priors from a diverse range of 3D geometries. CLAY uses neural fields to represent continuous and complete surfaces and a geometry generative module with pure transformer blocks in latent space. It employs a progressive training scheme on an ultra-large 3D model dataset, resulting in a 3D native geometry generator with 1.5 billion parameters. For appearance generation, CLAY generates physically-based rendering (PBR) textures using a multi-view material diffusion model, capable of producing 2K resolution textures with diffuse, roughness, and metallic modalities. CLAY supports a wide range of controllable adaptations and creations, from sketchy conceptual designs to production-ready assets with intricate details. Even first-time users can easily use CLAY to bring their vivid 3D imaginations to life, unleashing unlimited creativity. **Key Features:** - **Large-scale Generative Model:** 1.5 billion parameters, trained on high-quality 3D data. - **Multi-resolution VAE and DiT:** Efficient geometric data encoding and decoding. - **Progressive Training Scheme:** Enhances model performance and adaptability. - **Data Standardization:** Remeshing and annotation techniques for unified 3D datasets. - **Material Synthesis:** Physically-based rendering (PBR) textures with rich modalities. - **Controllable Adaptations:** Supports text, image, voxel, multi-view images, point clouds, bounding boxes, and partial point clouds. **Applications:** - **3D Asset Creation:** From sketchy conceptual designs to production-ready assets. - **Geometric Optimization:** Ensures structural integrity and aesthetic refinement. - **Material Synthesis:** Adds lifelike qualities through realistic textures and materials. - **Model Adaptation:** Versatile foundation model for efficient fine-tuning and conditional generation. **Results:** - **Diverse 3D Models:** Generation of a wide range of objects with intricate details and textures. - **Image Conditioning:** Faithful generation of geometric entities from input images. - **Multi-view Images:** Reliable reconstruction of 3D geometries from multiple perspectives. - **Point Clouds:** Effective surface reconstruction from sparse point clouds. - **Geometric Optimization:** Improvement of 3D geometries generated by existing techniques. - **Rich Variability:** Generation of diverse shapes from the same voxel input.
Reach us at info@study.space