**CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets**
CLAY is a novel 3D geometry and material generator designed to transform human imagination into intricate 3D digital structures. It supports various inputs, including text, images, and 3D-aware controls from diverse primitives. The core of CLAY is a large-scale generative model composed of a multi-resolution Variational Autoencoder (VAE) and a minimalistic latent Diffusion Transformer (DiT), which extracts rich 3D priors from a diverse range of 3D geometries. CLAY uses neural fields to represent continuous and complete surfaces and a geometry generative module with pure transformer blocks in latent space. It employs a progressive training scheme on an ultra-large 3D model dataset, resulting in a 3D native geometry generator with 1.5 billion parameters. For appearance generation, CLAY generates physically-based rendering (PBR) textures using a multi-view material diffusion model, capable of producing 2K resolution textures with diffuse, roughness, and metallic modalities. CLAY supports a wide range of controllable adaptations and creations, from sketchy conceptual designs to production-ready assets with intricate details. Even first-time users can easily use CLAY to bring their vivid 3D imaginations to life, unleashing unlimited creativity.
**Key Features:**
- **Large-scale Generative Model:** 1.5 billion parameters, trained on high-quality 3D data.
- **Multi-resolution VAE and DiT:** Efficient geometric data encoding and decoding.
- **Progressive Training Scheme:** Enhances model performance and adaptability.
- **Data Standardization:** Remeshing and annotation techniques for unified 3D datasets.
- **Material Synthesis:** Physically-based rendering (PBR) textures with rich modalities.
- **Controllable Adaptations:** Supports text, image, voxel, multi-view images, point clouds, bounding boxes, and partial point clouds.
**Applications:**
- **3D Asset Creation:** From sketchy conceptual designs to production-ready assets.
- **Geometric Optimization:** Ensures structural integrity and aesthetic refinement.
- **Material Synthesis:** Adds lifelike qualities through realistic textures and materials.
- **Model Adaptation:** Versatile foundation model for efficient fine-tuning and conditional generation.
**Results:**
- **Diverse 3D Models:** Generation of a wide range of objects with intricate details and textures.
- **Image Conditioning:** Faithful generation of geometric entities from input images.
- **Multi-view Images:** Reliable reconstruction of 3D geometries from multiple perspectives.
- **Point Clouds:** Effective surface reconstruction from sparse point clouds.
- **Geometric Optimization:** Improvement of 3D geometries generated by existing techniques.
- **Rich Variability:** Generation of diverse shapes from the same voxel input.**CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets**
CLAY is a novel 3D geometry and material generator designed to transform human imagination into intricate 3D digital structures. It supports various inputs, including text, images, and 3D-aware controls from diverse primitives. The core of CLAY is a large-scale generative model composed of a multi-resolution Variational Autoencoder (VAE) and a minimalistic latent Diffusion Transformer (DiT), which extracts rich 3D priors from a diverse range of 3D geometries. CLAY uses neural fields to represent continuous and complete surfaces and a geometry generative module with pure transformer blocks in latent space. It employs a progressive training scheme on an ultra-large 3D model dataset, resulting in a 3D native geometry generator with 1.5 billion parameters. For appearance generation, CLAY generates physically-based rendering (PBR) textures using a multi-view material diffusion model, capable of producing 2K resolution textures with diffuse, roughness, and metallic modalities. CLAY supports a wide range of controllable adaptations and creations, from sketchy conceptual designs to production-ready assets with intricate details. Even first-time users can easily use CLAY to bring their vivid 3D imaginations to life, unleashing unlimited creativity.
**Key Features:**
- **Large-scale Generative Model:** 1.5 billion parameters, trained on high-quality 3D data.
- **Multi-resolution VAE and DiT:** Efficient geometric data encoding and decoding.
- **Progressive Training Scheme:** Enhances model performance and adaptability.
- **Data Standardization:** Remeshing and annotation techniques for unified 3D datasets.
- **Material Synthesis:** Physically-based rendering (PBR) textures with rich modalities.
- **Controllable Adaptations:** Supports text, image, voxel, multi-view images, point clouds, bounding boxes, and partial point clouds.
**Applications:**
- **3D Asset Creation:** From sketchy conceptual designs to production-ready assets.
- **Geometric Optimization:** Ensures structural integrity and aesthetic refinement.
- **Material Synthesis:** Adds lifelike qualities through realistic textures and materials.
- **Model Adaptation:** Versatile foundation model for efficient fine-tuning and conditional generation.
**Results:**
- **Diverse 3D Models:** Generation of a wide range of objects with intricate details and textures.
- **Image Conditioning:** Faithful generation of geometric entities from input images.
- **Multi-view Images:** Reliable reconstruction of 3D geometries from multiple perspectives.
- **Point Clouds:** Effective surface reconstruction from sparse point clouds.
- **Geometric Optimization:** Improvement of 3D geometries generated by existing techniques.
- **Rich Variability:** Generation of diverse shapes from the same voxel input.