28 May 2024 | Giuseppe Vecchio, Valentin Deschaintre
**MatSynth: A Modern PBR Materials Dataset**
**Introduction:**
MatSynth is a dataset of 4,069 high-quality, 4K, tileable materials with permissive licenses. Each material is augmented, rendered, and supplemented by metadata containing its origin, tags, categories, creation method, and more. The dataset aims to bridge the gap between private and public datasets, providing a larger, more diverse, and higher-resolution set of materials than previously available.
**Dataset Details:**
- **Materials:** 4,069 unique 4K PBR materials.
- **Augmented Materials:** 683,592 renderings with various scales, crops, rotations, and environment illuminations.
- **Renderings:** 3,417,960 renderings under different illumination conditions.
- **Metadata:** Each material includes reflectance maps (base color, diffuse, normal, height, roughness, metallic, specular) and additional annotations such as capture method, tags, source, license, and physical size.
**Data Collection and Processing:**
- **Sources:** Materials were collected from various online sources under CC0 and CC-BY licenses.
- **Filtering:** Over 6000 materials were filtered to ensure they are tileable and 4K resolution.
- **Quality Check:** Materials were visually inspected and automatically checked using CLIP embeddings to ensure quality and uniqueness.
**Evaluation:**
- **Test Set:** A static test set of 89 materials was created for evaluation.
- **Evaluation Methods:** Performance was compared between state-of-the-art methods trained on the existing dataset and MatSynth.
- **Results:** MatSynth significantly improved the quality of material acquisition and generation, as demonstrated by quantitative and qualitative evaluations.
**Conclusion:**
MatSynth addresses the gap in material datasets by providing a large, high-quality, and diverse collection of materials. The dataset is crucial for advancing research in material acquisition, generation, and synthetic data generation.**MatSynth: A Modern PBR Materials Dataset**
**Introduction:**
MatSynth is a dataset of 4,069 high-quality, 4K, tileable materials with permissive licenses. Each material is augmented, rendered, and supplemented by metadata containing its origin, tags, categories, creation method, and more. The dataset aims to bridge the gap between private and public datasets, providing a larger, more diverse, and higher-resolution set of materials than previously available.
**Dataset Details:**
- **Materials:** 4,069 unique 4K PBR materials.
- **Augmented Materials:** 683,592 renderings with various scales, crops, rotations, and environment illuminations.
- **Renderings:** 3,417,960 renderings under different illumination conditions.
- **Metadata:** Each material includes reflectance maps (base color, diffuse, normal, height, roughness, metallic, specular) and additional annotations such as capture method, tags, source, license, and physical size.
**Data Collection and Processing:**
- **Sources:** Materials were collected from various online sources under CC0 and CC-BY licenses.
- **Filtering:** Over 6000 materials were filtered to ensure they are tileable and 4K resolution.
- **Quality Check:** Materials were visually inspected and automatically checked using CLIP embeddings to ensure quality and uniqueness.
**Evaluation:**
- **Test Set:** A static test set of 89 materials was created for evaluation.
- **Evaluation Methods:** Performance was compared between state-of-the-art methods trained on the existing dataset and MatSynth.
- **Results:** MatSynth significantly improved the quality of material acquisition and generation, as demonstrated by quantitative and qualitative evaluations.
**Conclusion:**
MatSynth addresses the gap in material datasets by providing a large, high-quality, and diverse collection of materials. The dataset is crucial for advancing research in material acquisition, generation, and synthetic data generation.