Clio: Real-time Task-Driven Open-Set 3D Scene Graphs

Clio: Real-time Task-Driven Open-Set 3D Scene Graphs

29 Apr 2024 | Dominic Maggio, Yun Chang, Nathan Hughes, Matthew Trang, Dan Griffith, Carolyn Dougherty, Eric Cristofalo, Lukas Schmid, Luca Carlone
Clio is a real-time system for task-driven open-set 3D scene graph construction. The paper proposes a task-driven 3D scene understanding problem where a robot is given a list of natural language tasks and must select the granularity and subset of objects and scene structure to retain in its map that is sufficient to complete the tasks. This problem is naturally formulated using the Information Bottleneck (IB) framework. The paper presents an Agglomerative IB algorithm for task-driven 3D scene understanding, which clusters 3D primitives into task-relevant objects and regions incrementally. The algorithm is integrated into a real-time system called Clio, which constructs a hierarchical 3D scene graph of the environment online using onboard compute. The system is tested on the Replica dataset and four real environments, demonstrating that Clio allows real-time construction of compact open-set 3D scene graphs and improves task execution accuracy by limiting the map to relevant semantic concepts. The system is also tested on a Boston Dynamics Spot robot, showing its ability to support real-time task execution. The paper highlights the importance of task-driven mapping in robotics, where the granularity of the map is determined by the tasks to be performed. Clio's approach is compared to other methods, showing its effectiveness in clustering objects and regions based on task relevance. The system is evaluated on various metrics, including object detection accuracy, precision, recall, and F1 score, demonstrating its performance in both open-set and closed-set scenarios. The paper concludes that Clio provides a task-driven approach to 3D metric-semantic mapping, enabling robots to create maps that are sufficient to support their tasks.Clio is a real-time system for task-driven open-set 3D scene graph construction. The paper proposes a task-driven 3D scene understanding problem where a robot is given a list of natural language tasks and must select the granularity and subset of objects and scene structure to retain in its map that is sufficient to complete the tasks. This problem is naturally formulated using the Information Bottleneck (IB) framework. The paper presents an Agglomerative IB algorithm for task-driven 3D scene understanding, which clusters 3D primitives into task-relevant objects and regions incrementally. The algorithm is integrated into a real-time system called Clio, which constructs a hierarchical 3D scene graph of the environment online using onboard compute. The system is tested on the Replica dataset and four real environments, demonstrating that Clio allows real-time construction of compact open-set 3D scene graphs and improves task execution accuracy by limiting the map to relevant semantic concepts. The system is also tested on a Boston Dynamics Spot robot, showing its ability to support real-time task execution. The paper highlights the importance of task-driven mapping in robotics, where the granularity of the map is determined by the tasks to be performed. Clio's approach is compared to other methods, showing its effectiveness in clustering objects and regions based on task relevance. The system is evaluated on various metrics, including object detection accuracy, precision, recall, and F1 score, demonstrating its performance in both open-set and closed-set scenarios. The paper concludes that Clio provides a task-driven approach to 3D metric-semantic mapping, enabling robots to create maps that are sufficient to support their tasks.
Reach us at info@study.space
[slides and audio] Clio%3A Real-Time Task-Driven Open-Set 3D Scene Graphs