FlashTex: Fast Relightable Mesh Texturing with LightControlNet

FlashTex: Fast Relightable Mesh Texturing with LightControlNet

22 Apr 2024 | Kangle Deng, Timothy Omernick, Alexander Weiss, Deva Ramanan, Jun-Yan Zhu, Tinghui Zhou, Maneesh Agrawala
**FlashTex: Fast Relightable Mesh Texturing with LightControlNet** **Authors:** Kangle Deng, Timothy Omernick, Alexander Weiss, Deva Ramanan, Jun-Yan Zhu, Tinghui Zhou, Maneesh Agrawala **Institutional Affiliations:** Roblox, Carnegie Mellon University, Stanford University **Abstract:** Creating high-quality textures for 3D meshes is crucial in various industries, but traditional methods are labor-intensive and require extensive training. This paper proposes a fast approach to automatically texture an input 3D mesh based on a user-provided text prompt. The key innovation is LightControlNet, a text-to-image model based on the ControlNet architecture, which allows specifying desired lighting as a conditioning image. The pipeline consists of two stages: the first stage uses LightControlNet to generate visually consistent reference views of the mesh, and the second stage applies texture optimization using Score Distillation Sampling (SDS) to improve texture quality while disentangling surface material from lighting. The method significantly outperforms previous text-to-texture methods in terms of speed and texture quality, making it suitable for applications requiring relighting in different environments. **Introduction:** Creating high-quality textures for 3D meshes is essential in gaming, film, animation, AR/VR, and industrial design. Traditional methods are time-consuming and require extensive training. Recent advancements in text-to-image diffusion models have shifted the paradigm for image creation. However, existing methods suffer from slow generation speed, visual artifacts, and baked-in lighting, making them unsuitable for commercial applications. This paper addresses these limitations by proposing an efficient approach that disentangles lighting from surface material/reflectance, enabling proper relighting. **Method:** The proposed method uses LightControlNet, an illumination-aware text-to-image diffusion model, to generate relightable textures. The pipeline operates in two stages: Stage 1 uses multi-view visual prompting to generate visually consistent reference views of the mesh, and Stage 2 applies texture optimization using SDS to improve texture quality and disentangle lighting from surface material. The method is significantly faster than previous methods while producing high-quality, relightable textures. **Experiments:** Comprehensive experiments on the Objaverse dataset and curated 3D game assets demonstrate the effectiveness of the proposed method. The method outperforms existing baselines in both quantitative and qualitative evaluations, showing superior texture quality and relighting capabilities. A user study further confirms the preference for the proposed method over baselines in terms of realism, texture consistency, and plausibility under varied lighting conditions. **Discussion:** The proposed method addresses the limitations of existing techniques by providing fast, high-fidelity textures with disentangled lighting and surface reflectance. However, it still has some limitations, such as baked-in lighting in certain cases and incomplete disentanglement of material parameters.**FlashTex: Fast Relightable Mesh Texturing with LightControlNet** **Authors:** Kangle Deng, Timothy Omernick, Alexander Weiss, Deva Ramanan, Jun-Yan Zhu, Tinghui Zhou, Maneesh Agrawala **Institutional Affiliations:** Roblox, Carnegie Mellon University, Stanford University **Abstract:** Creating high-quality textures for 3D meshes is crucial in various industries, but traditional methods are labor-intensive and require extensive training. This paper proposes a fast approach to automatically texture an input 3D mesh based on a user-provided text prompt. The key innovation is LightControlNet, a text-to-image model based on the ControlNet architecture, which allows specifying desired lighting as a conditioning image. The pipeline consists of two stages: the first stage uses LightControlNet to generate visually consistent reference views of the mesh, and the second stage applies texture optimization using Score Distillation Sampling (SDS) to improve texture quality while disentangling surface material from lighting. The method significantly outperforms previous text-to-texture methods in terms of speed and texture quality, making it suitable for applications requiring relighting in different environments. **Introduction:** Creating high-quality textures for 3D meshes is essential in gaming, film, animation, AR/VR, and industrial design. Traditional methods are time-consuming and require extensive training. Recent advancements in text-to-image diffusion models have shifted the paradigm for image creation. However, existing methods suffer from slow generation speed, visual artifacts, and baked-in lighting, making them unsuitable for commercial applications. This paper addresses these limitations by proposing an efficient approach that disentangles lighting from surface material/reflectance, enabling proper relighting. **Method:** The proposed method uses LightControlNet, an illumination-aware text-to-image diffusion model, to generate relightable textures. The pipeline operates in two stages: Stage 1 uses multi-view visual prompting to generate visually consistent reference views of the mesh, and Stage 2 applies texture optimization using SDS to improve texture quality and disentangle lighting from surface material. The method is significantly faster than previous methods while producing high-quality, relightable textures. **Experiments:** Comprehensive experiments on the Objaverse dataset and curated 3D game assets demonstrate the effectiveness of the proposed method. The method outperforms existing baselines in both quantitative and qualitative evaluations, showing superior texture quality and relighting capabilities. A user study further confirms the preference for the proposed method over baselines in terms of realism, texture consistency, and plausibility under varied lighting conditions. **Discussion:** The proposed method addresses the limitations of existing techniques by providing fast, high-fidelity textures with disentangled lighting and surface reflectance. However, it still has some limitations, such as baked-in lighting in certain cases and incomplete disentanglement of material parameters.
Reach us at info@study.space