**WordRobe: Text-Guided Generation of Textured 3D Garments**
**Authors:** Astitva Srivastava, Pranav Manu, Amit Raj, Varun Jampani, Avinash Sharma
**Abstract:**
This paper addresses the challenging problem of generating high-quality, textured 3D garments from text prompts. WordRobe is a novel framework that learns a latent representation of 3D garments using a coarse-to-fine training strategy and a disentanglement loss, enabling better latent interpolation. The garment latent space is aligned with CLIP embedding space through weakly supervised training, allowing text-driven generation and editing. For texture synthesis, WordRobe leverages ControlNet's zero-shot generation capability to synthesize view-consistent texture maps in a single feed-forward step, significantly reducing generation time compared to existing methods. The method outperforms current state-of-the-art (SOTA) in learning 3D garment latent spaces, garment interpolation, and text-driven texture synthesis, as demonstrated by quantitative and qualitative evaluations.
**Key Contributions:**
- A novel framework and training strategy for text-driven 3D garment generation via a garment latent space.
- A new disentanglement loss for better separation of concepts in the latent space and a new metric to assess its performance.
- An optimization-free (single feed-forward) text-guided texture synthesis method that is both superior and efficient.
**Method:**
1. **3D Garment Latent Space:** Learn a latent space for unposed 3D garments using a two-stage encoder-decoder framework, representing garments as unsigned distance fields (UDFs).
2. **Mapping Network:** Predict garment latent codes from CLIP embeddings, enabling text-driven generation and editing.
3. **Text-guided Texture Synthesis:** Synthesize high-quality, diverse texture maps for 3D garments using ControlNet, maintaining view consistency in a single feed-forward step.
**Experiments and Results:**
- WordRobe generates high-quality, unposed 3D garments with diverse textures from user-friendly text prompts.
- Qualitative and quantitative evaluations show superior performance compared to existing methods.
- Ablation studies validate the effectiveness of the proposed components.
- User studies confirm the method's effectiveness in generating and editing 3D garments.
**Conclusion:**
WordRobe is a novel method for text-driven generation and editing of textured 3D garments, achieving state-of-the-art performance in learning 3D garment latent spaces and generating high-fidelity texture maps. The method's efficiency and quality make it a promising solution for production-ready 3D garment generation from text prompts.**WordRobe: Text-Guided Generation of Textured 3D Garments**
**Authors:** Astitva Srivastava, Pranav Manu, Amit Raj, Varun Jampani, Avinash Sharma
**Abstract:**
This paper addresses the challenging problem of generating high-quality, textured 3D garments from text prompts. WordRobe is a novel framework that learns a latent representation of 3D garments using a coarse-to-fine training strategy and a disentanglement loss, enabling better latent interpolation. The garment latent space is aligned with CLIP embedding space through weakly supervised training, allowing text-driven generation and editing. For texture synthesis, WordRobe leverages ControlNet's zero-shot generation capability to synthesize view-consistent texture maps in a single feed-forward step, significantly reducing generation time compared to existing methods. The method outperforms current state-of-the-art (SOTA) in learning 3D garment latent spaces, garment interpolation, and text-driven texture synthesis, as demonstrated by quantitative and qualitative evaluations.
**Key Contributions:**
- A novel framework and training strategy for text-driven 3D garment generation via a garment latent space.
- A new disentanglement loss for better separation of concepts in the latent space and a new metric to assess its performance.
- An optimization-free (single feed-forward) text-guided texture synthesis method that is both superior and efficient.
**Method:**
1. **3D Garment Latent Space:** Learn a latent space for unposed 3D garments using a two-stage encoder-decoder framework, representing garments as unsigned distance fields (UDFs).
2. **Mapping Network:** Predict garment latent codes from CLIP embeddings, enabling text-driven generation and editing.
3. **Text-guided Texture Synthesis:** Synthesize high-quality, diverse texture maps for 3D garments using ControlNet, maintaining view consistency in a single feed-forward step.
**Experiments and Results:**
- WordRobe generates high-quality, unposed 3D garments with diverse textures from user-friendly text prompts.
- Qualitative and quantitative evaluations show superior performance compared to existing methods.
- Ablation studies validate the effectiveness of the proposed components.
- User studies confirm the method's effectiveness in generating and editing 3D garments.
**Conclusion:**
WordRobe is a novel method for text-driven generation and editing of textured 3D garments, achieving state-of-the-art performance in learning 3D garment latent spaces and generating high-fidelity texture maps. The method's efficiency and quality make it a promising solution for production-ready 3D garment generation from text prompts.