[slides] PhyScene%3A Physically Interactable 3D Scene Synthesis for Embodied AI

**Abstract:** The paper introduces PHYSCENE, a novel method for generating physically interactive 3D scenes tailored for embodied agents. PHYSCENE addresses the gap between scene synthesis and embodied AI by incorporating realistic layouts, articulated objects, and rich physical interactivity. The method uses a conditional diffusion model to capture scene layouts and integrates physics- and interactivity-based guidance mechanisms that enforce constraints on object collision, room layout, and object reachability. Extensive experiments demonstrate that PHYSCENE outperforms existing state-of-the-art scene synthesis methods in terms of physical plausibility and interactivity, making it a promising tool for facilitating skill acquisition in interactive environments. **Introduction:** Scene synthesis has evolved from creating realistic environments for indoor design to supporting complex embodied tasks in simulated environments. However, achieving seamless scene generation for embodied AI (EAI) remains challenging due to the need for physical constraints and interactivity. PHYSCENE aims to bridge this gap by integrating physical commonsense into scene synthesis. **Method:** PHYSCENE leverages guided diffusion models to learn scene distributions and generate scenes with realistic layouts and interactable objects. It incorporates shape and geometry features to bridge rigid-body objects with articulated object datasets. Key constraints, including collision avoidance, room layout, and object reachability, are converted into guidance functions to ensure physical plausibility and interactivity. **Evaluation:** Experiments on the 3D-FRONT dataset show that PHYSCENE significantly reduces collision rates and improves physical plausibility compared to existing methods. The method also enhances interactivity, as demonstrated by its ability to generate scenes with fewer violations of floor plan constraints and better reachability. **Contributions:** - PHYSCENE: A guided diffusion model for physically interactive scene synthesis. - Novel guidance functions that enforce physical constraints and interactivity. - State-of-the-art performance on traditional scene synthesis metrics and physical plausibility metrics. **Related Work:** The paper discusses existing approaches in indoor scene synthesis, physical plausibility, and guided diffusion models, highlighting the unique contributions of PHYSCENE. **Conclusion:** PHYSCENE effectively generates physically interactive 3D scenes, enhancing the realism and interactivity of scenes for embodied agents. Future work will focus on expanding the applicability of PHYSCENE to include small objects and more complex manipulation tasks.**Abstract:** The paper introduces PHYSCENE, a novel method for generating physically interactive 3D scenes tailored for embodied agents. PHYSCENE addresses the gap between scene synthesis and embodied AI by incorporating realistic layouts, articulated objects, and rich physical interactivity. The method uses a conditional diffusion model to capture scene layouts and integrates physics- and interactivity-based guidance mechanisms that enforce constraints on object collision, room layout, and object reachability. Extensive experiments demonstrate that PHYSCENE outperforms existing state-of-the-art scene synthesis methods in terms of physical plausibility and interactivity, making it a promising tool for facilitating skill acquisition in interactive environments. **Introduction:** Scene synthesis has evolved from creating realistic environments for indoor design to supporting complex embodied tasks in simulated environments. However, achieving seamless scene generation for embodied AI (EAI) remains challenging due to the need for physical constraints and interactivity. PHYSCENE aims to bridge this gap by integrating physical commonsense into scene synthesis. **Method:** PHYSCENE leverages guided diffusion models to learn scene distributions and generate scenes with realistic layouts and interactable objects. It incorporates shape and geometry features to bridge rigid-body objects with articulated object datasets. Key constraints, including collision avoidance, room layout, and object reachability, are converted into guidance functions to ensure physical plausibility and interactivity. **Evaluation:** Experiments on the 3D-FRONT dataset show that PHYSCENE significantly reduces collision rates and improves physical plausibility compared to existing methods. The method also enhances interactivity, as demonstrated by its ability to generate scenes with fewer violations of floor plan constraints and better reachability. **Contributions:** - PHYSCENE: A guided diffusion model for physically interactive scene synthesis. - Novel guidance functions that enforce physical constraints and interactivity. - State-of-the-art performance on traditional scene synthesis metrics and physical plausibility metrics. **Related Work:** The paper discusses existing approaches in indoor scene synthesis, physical plausibility, and guided diffusion models, highlighting the unique contributions of PHYSCENE. **Conclusion:** PHYSCENE effectively generates physically interactive 3D scenes, enhancing the realism and interactivity of scenes for embodied agents. Future work will focus on expanding the applicability of PHYSCENE to include small objects and more complex manipulation tasks.

PHYSCE: Physically Interactable 3D Scene Synthesis for Embodied AI

10 Jul 2024 | Yandan Yang, Baoxiong Jia, Peiyuan Zhi, Siyuan Huang