WordRobe is a novel framework for generating high-quality, unposed 3D garments with realistic textures from user-friendly text prompts. The framework learns a latent space of 3D garments using a two-stage encoder-decoder approach, enabling text-driven generation and editing. It aligns the garment latent space to the CLIP embedding space in a weakly supervised manner, allowing text-guided generation. For texture synthesis, it leverages ControlNet to generate view-consistent textures in a single feed-forward step, significantly improving efficiency and quality. The framework outperforms existing methods in garment interpolation, texture synthesis, and overall quality, as demonstrated by quantitative evaluations and user studies. WordRobe generates unposed 3D garments that can be directly used in standard cloth simulation and animation pipelines without post-processing. The method enables text-driven generation of high-fidelity 3D garments with diverse textures, offering a scalable and efficient solution for automated 3D garment creation. The framework is evaluated on a large dataset of unposed garments, demonstrating its ability to generalize across various garment categories and styles. The method also supports sketch-guided generation and editing, as well as texture synthesis from images. WordRobe's approach is compared with state-of-the-art methods, showing superior performance in terms of surface quality, view consistency, and overall quality. The framework is also evaluated through user studies, where it is preferred over existing methods for both text-to-result relationship and overall quality. The method's ability to generate high-quality 3D garments with diverse textures and its efficiency in texture synthesis make it a promising solution for 3D garment generation.WordRobe is a novel framework for generating high-quality, unposed 3D garments with realistic textures from user-friendly text prompts. The framework learns a latent space of 3D garments using a two-stage encoder-decoder approach, enabling text-driven generation and editing. It aligns the garment latent space to the CLIP embedding space in a weakly supervised manner, allowing text-guided generation. For texture synthesis, it leverages ControlNet to generate view-consistent textures in a single feed-forward step, significantly improving efficiency and quality. The framework outperforms existing methods in garment interpolation, texture synthesis, and overall quality, as demonstrated by quantitative evaluations and user studies. WordRobe generates unposed 3D garments that can be directly used in standard cloth simulation and animation pipelines without post-processing. The method enables text-driven generation of high-fidelity 3D garments with diverse textures, offering a scalable and efficient solution for automated 3D garment creation. The framework is evaluated on a large dataset of unposed garments, demonstrating its ability to generalize across various garment categories and styles. The method also supports sketch-guided generation and editing, as well as texture synthesis from images. WordRobe's approach is compared with state-of-the-art methods, showing superior performance in terms of surface quality, view consistency, and overall quality. The framework is also evaluated through user studies, where it is preferred over existing methods for both text-to-result relationship and overall quality. The method's ability to generate high-quality 3D garments with diverse textures and its efficiency in texture synthesis make it a promising solution for 3D garment generation.