2 May 2024 | Guangyao Zhai, Evin Pinar Örnek, Dave Zhenyu Chen, Ruotong Liao, Yan Di, Nassir Navab, Federico Tombari, Benjamin Busam
EchoScene is a method for generating 3D indoor scenes from scene graphs using a dual-branch diffusion model. The model enables interactive and controllable scene generation by dynamically adapting to scene graphs. It addresses challenges in handling varying numbers of nodes and complex edge combinations in scene graphs. EchoScene associates each node with a denoising process and enables collaborative information exchange, enhancing controllability and consistency. The information echo scheme allows processes to share denoising data through an information exchange unit, ensuring global awareness of the scene graph. This results in globally coherent scenes. The model is trained with a dual-branch diffusion model, where each branch contains multiple denoising processes. The layout branch generates scene layouts, while the shape branch generates object shapes. The model is evaluated on the SG-FRONT dataset, showing improved generation fidelity and inter-object consistency compared to previous methods. EchoScene is compatible with off-the-shelf texture generators, enabling photorealistic appearances. The model outperforms existing methods in generation fidelity and robustness to graph manipulation. It also demonstrates effective handling of inter-object style consistency. The method is evaluated on various metrics, including FID, FID_CLIP, and KID, showing superior performance. EchoScene is also tested on graph constraints, demonstrating better performance in maintaining scene graph constraints. The model is shown to be effective in generating coherent scene layouts and shapes, with high-quality results. The method is applicable to various tasks, including robotic imagination and manipulation. The generated scenes are of high quality and can be enhanced with texture generators for photorealistic appearances.EchoScene is a method for generating 3D indoor scenes from scene graphs using a dual-branch diffusion model. The model enables interactive and controllable scene generation by dynamically adapting to scene graphs. It addresses challenges in handling varying numbers of nodes and complex edge combinations in scene graphs. EchoScene associates each node with a denoising process and enables collaborative information exchange, enhancing controllability and consistency. The information echo scheme allows processes to share denoising data through an information exchange unit, ensuring global awareness of the scene graph. This results in globally coherent scenes. The model is trained with a dual-branch diffusion model, where each branch contains multiple denoising processes. The layout branch generates scene layouts, while the shape branch generates object shapes. The model is evaluated on the SG-FRONT dataset, showing improved generation fidelity and inter-object consistency compared to previous methods. EchoScene is compatible with off-the-shelf texture generators, enabling photorealistic appearances. The model outperforms existing methods in generation fidelity and robustness to graph manipulation. It also demonstrates effective handling of inter-object style consistency. The method is evaluated on various metrics, including FID, FID_CLIP, and KID, showing superior performance. EchoScene is also tested on graph constraints, demonstrating better performance in maintaining scene graph constraints. The model is shown to be effective in generating coherent scene layouts and shapes, with high-quality results. The method is applicable to various tasks, including robotic imagination and manipulation. The generated scenes are of high quality and can be enhanced with texture generators for photorealistic appearances.