Understanding WordRobe%3A Text-Guided Generation of Textured 3D Garments

**WordRobe: Text-Guided Generation of Textured 3D Garments** **Authors:** Astitva Srivastava, Pranav Manu, Amit Raj, Varun Jampani, Avinash Sharma **Abstract:** This paper addresses the challenging problem of generating high-quality, textured 3D garments from text prompts. WordRobe is a novel framework that learns a latent representation of 3D garments using a coarse-to-fine training strategy and a disentanglement loss, enabling better latent interpolation. The garment latent space is aligned with CLIP embedding space through weakly supervised training, allowing text-driven generation and editing. For texture synthesis, WordRobe leverages ControlNet's zero-shot generation capability to synthesize view-consistent texture maps in a single feed-forward step, significantly reducing generation time compared to existing methods. The method outperforms current state-of-the-art (SOTA) in learning 3D garment latent spaces, garment interpolation, and text-driven texture synthesis, as demonstrated by quantitative and qualitative evaluations. **Key Contributions:** - A novel framework and training strategy for text-driven 3D garment generation via a garment latent space. - A new disentanglement loss for better separation of concepts in the latent space and a new metric to assess its performance. - An optimization-free (single feed-forward) text-guided texture synthesis method that is both superior and efficient. **Method:** 1. **3D Garment Latent Space:** Learn a latent space for unposed 3D garments using a two-stage encoder-decoder framework, representing garments as unsigned distance fields (UDFs). 2. **Mapping Network:** Predict garment latent codes from CLIP embeddings, enabling text-driven generation and editing. 3. **Text-guided Texture Synthesis:** Synthesize high-quality, diverse texture maps for 3D garments using ControlNet, maintaining view consistency in a single feed-forward step. **Experiments and Results:** - WordRobe generates high-quality, unposed 3D garments with diverse textures from user-friendly text prompts. - Qualitative and quantitative evaluations show superior performance compared to existing methods. - Ablation studies validate the effectiveness of the proposed components. - User studies confirm the method's effectiveness in generating and editing 3D garments. **Conclusion:** WordRobe is a novel method for text-driven generation and editing of textured 3D garments, achieving state-of-the-art performance in learning 3D garment latent spaces and generating high-fidelity texture maps. The method's efficiency and quality make it a promising solution for production-ready 3D garment generation from text prompts.**WordRobe: Text-Guided Generation of Textured 3D Garments** **Authors:** Astitva Srivastava, Pranav Manu, Amit Raj, Varun Jampani, Avinash Sharma **Abstract:** This paper addresses the challenging problem of generating high-quality, textured 3D garments from text prompts. WordRobe is a novel framework that learns a latent representation of 3D garments using a coarse-to-fine training strategy and a disentanglement loss, enabling better latent interpolation. The garment latent space is aligned with CLIP embedding space through weakly supervised training, allowing text-driven generation and editing. For texture synthesis, WordRobe leverages ControlNet's zero-shot generation capability to synthesize view-consistent texture maps in a single feed-forward step, significantly reducing generation time compared to existing methods. The method outperforms current state-of-the-art (SOTA) in learning 3D garment latent spaces, garment interpolation, and text-driven texture synthesis, as demonstrated by quantitative and qualitative evaluations. **Key Contributions:** - A novel framework and training strategy for text-driven 3D garment generation via a garment latent space. - A new disentanglement loss for better separation of concepts in the latent space and a new metric to assess its performance. - An optimization-free (single feed-forward) text-guided texture synthesis method that is both superior and efficient. **Method:** 1. **3D Garment Latent Space:** Learn a latent space for unposed 3D garments using a two-stage encoder-decoder framework, representing garments as unsigned distance fields (UDFs). 2. **Mapping Network:** Predict garment latent codes from CLIP embeddings, enabling text-driven generation and editing. 3. **Text-guided Texture Synthesis:** Synthesize high-quality, diverse texture maps for 3D garments using ControlNet, maintaining view consistency in a single feed-forward step. **Experiments and Results:** - WordRobe generates high-quality, unposed 3D garments with diverse textures from user-friendly text prompts. - Qualitative and quantitative evaluations show superior performance compared to existing methods. - Ablation studies validate the effectiveness of the proposed components. - User studies confirm the method's effectiveness in generating and editing 3D garments. **Conclusion:** WordRobe is a novel method for text-driven generation and editing of textured 3D garments, achieving state-of-the-art performance in learning 3D garment latent spaces and generating high-fidelity texture maps. The method's efficiency and quality make it a promising solution for production-ready 3D garment generation from text prompts.

WordRobe: Text-Guided Generation of Textured 3D Garments

14 Jul 2024 | Astitva Srivastava, Pranav Manu, Amit Raj, Varun Jampani, Avinash Sharma