Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation

Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation

3 Jun 2024 | Abdelrhaman Werby1*, Chenguang Huang1*, Martin Büchner1*, Abhinav Valada1, Wolfram Burgard2
The paper introduces HOV-SG (Hierarchical Open-Vocabulary 3D Scene Graphs), a novel approach for language-grounded indoor robot navigation. HOV-SG leverages open-vocabulary vision-language models to create state-of-the-art 3D segment-level maps and constructs a hierarchical 3D scene graph that includes floor, room, and object concepts. This graph is designed to handle multi-story buildings and enable effective robotic navigation using a cross-floor Voronoi graph. The method is evaluated on three datasets, demonstrating superior open-vocabulary semantic accuracy at the object, room, and floor levels, while reducing representation size by 75% compared to dense open-vocabulary maps. Real-world experiments with a Boston Dynamics Spot robot show successful long-horizon language-conditioned navigation in multi-story environments, validating the efficacy and generalization capabilities of HOV-SG. The code and evaluation protocol are made publicly available to foster future research.The paper introduces HOV-SG (Hierarchical Open-Vocabulary 3D Scene Graphs), a novel approach for language-grounded indoor robot navigation. HOV-SG leverages open-vocabulary vision-language models to create state-of-the-art 3D segment-level maps and constructs a hierarchical 3D scene graph that includes floor, room, and object concepts. This graph is designed to handle multi-story buildings and enable effective robotic navigation using a cross-floor Voronoi graph. The method is evaluated on three datasets, demonstrating superior open-vocabulary semantic accuracy at the object, room, and floor levels, while reducing representation size by 75% compared to dense open-vocabulary maps. Real-world experiments with a Boston Dynamics Spot robot show successful long-horizon language-conditioned navigation in multi-story environments, validating the efficacy and generalization capabilities of HOV-SG. The code and evaluation protocol are made publicly available to foster future research.
Reach us at info@study.space
[slides and audio] Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation