RGB↔X: Image decomposition and synthesis using material- and lighting-aware diffusion models

RGB↔X: Image decomposition and synthesis using material- and lighting-aware diffusion models

1 May 2024 | Zheng Zeng, Valentin Deschaintre, Iliyan Georgiev, Yannick Hold-Geoffroy, Yiwei Hu, Fujun Luan, Ling-Qi Yan, Miloš Hašan
The paper presents a unified diffusion framework for image decomposition and synthesis, focusing on interior scenes. The framework includes two models: RGB→X and X→RGB. The RGB→X model estimates intrinsic channels (albedo, normal vector, roughness, metallicity, and lighting) from an input RGB image, while the X→RGB model synthesizes realistic images from these intrinsic channels. The RGB→X model improves upon previous methods by using more training data from multiple heterogeneous datasets and adding support for lighting estimation. The X→RGB model is capable of synthesizing realistic images from given intrinsic channels, supporting partial information and optional text prompts. The paper demonstrates the effectiveness of the models through various experiments, including material editing, object insertion, and relighting. The results show that the models achieve high-quality estimates and syntheses, outperforming existing methods in terms of albedo, normal, roughness, metallicity, and lighting estimation. The paper also discusses the challenges and limitations of the models, such as the variability in material property estimation and the need for handling heterogeneous data during training. Overall, the work contributes to the development of unified frameworks for image decomposition and synthesis, which can benefit a wide range of downstream editing tasks.The paper presents a unified diffusion framework for image decomposition and synthesis, focusing on interior scenes. The framework includes two models: RGB→X and X→RGB. The RGB→X model estimates intrinsic channels (albedo, normal vector, roughness, metallicity, and lighting) from an input RGB image, while the X→RGB model synthesizes realistic images from these intrinsic channels. The RGB→X model improves upon previous methods by using more training data from multiple heterogeneous datasets and adding support for lighting estimation. The X→RGB model is capable of synthesizing realistic images from given intrinsic channels, supporting partial information and optional text prompts. The paper demonstrates the effectiveness of the models through various experiments, including material editing, object insertion, and relighting. The results show that the models achieve high-quality estimates and syntheses, outperforming existing methods in terms of albedo, normal, roughness, metallicity, and lighting estimation. The paper also discusses the challenges and limitations of the models, such as the variability in material property estimation and the need for handling heterogeneous data during training. Overall, the work contributes to the development of unified frameworks for image decomposition and synthesis, which can benefit a wide range of downstream editing tasks.
Reach us at info@study.space