Language-Grounded Dynamic Scene Graphs for Interactive Object Search with Mobile Manipulation

Language-Grounded Dynamic Scene Graphs for Interactive Object Search with Mobile Manipulation

31 Jul 2024 | Daniel Honerkamp, Martin Büchner, Fabien Despinoy, Tim Welschehold, Abhinav Valada
This paper introduces MoMa-LLM, a novel approach that integrates language models with structured representations derived from open-vocabulary scene graphs, dynamically updated as the environment is explored. The method is designed to enable mobile manipulation robots to autonomously execute long-horizon tasks in large, unexplored environments. MoMa-LLM intertwines high-level reasoning with scalable dynamic scene representations, grounding large language models in hierarchical 3D scene graphs that include object- and room-level entities, as well as a navigational Voronoi graph. The approach is zero-shot, open-vocabulary, and can be extended to a wide range of mobile manipulation and household robotic tasks. The effectiveness of MoMa-LLM is demonstrated in a semantic interactive search task in large, realistic indoor environments, showing improved search efficiency compared to conventional baselines and state-of-the-art approaches. The code for MoMa-LLM is publicly available at <http://moma-llm.cs.uni-freiburg.de>.This paper introduces MoMa-LLM, a novel approach that integrates language models with structured representations derived from open-vocabulary scene graphs, dynamically updated as the environment is explored. The method is designed to enable mobile manipulation robots to autonomously execute long-horizon tasks in large, unexplored environments. MoMa-LLM intertwines high-level reasoning with scalable dynamic scene representations, grounding large language models in hierarchical 3D scene graphs that include object- and room-level entities, as well as a navigational Voronoi graph. The approach is zero-shot, open-vocabulary, and can be extended to a wide range of mobile manipulation and household robotic tasks. The effectiveness of MoMa-LLM is demonstrated in a semantic interactive search task in large, realistic indoor environments, showing improved search efficiency compared to conventional baselines and state-of-the-art approaches. The code for MoMa-LLM is publicly available at <http://moma-llm.cs.uni-freiburg.de>.
Reach us at info@study.space
[slides] Language-Grounded Dynamic Scene Graphs for Interactive Object Search With Mobile Manipulation | StudySpace