OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments

OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments

28 Mar 2024 | Yinan Deng, Jiahui Wang, Jingyu Zhao, Xinyu Tian, Guangyan Chen, Yi Yang, Yufeng Yue*
OpenGraph is a novel framework designed for open-vocabulary hierarchical 3D graph representation in large-scale outdoor environments. It addresses the limitations of existing open-vocabulary maps, which are primarily designed for small-scale environments and lack robust reasoning capabilities and efficient map structures. OpenGraph leverages Visual-Language Models (VLMs) and Large Language Models (LLMs) to enhance object comprehension and reasoning. The framework consists of three main modules: Caption-Enhanced Object Comprehension, Object-Centric Map Construction, and Hierarchical Graph Representation Formation. It extracts instance masks and captions from 2D images, projects these onto 3D LiDAR point clouds, and constructs a hierarchical graph based on lane graph connectivity. Validation on the SemanticKITTI dataset demonstrates that OpenGraph achieves superior segmentation and query accuracy, making it suitable for various downstream tasks such as zero-shot semantic segmentation, open-vocabulary object retrieval, structured topology query, global path planning, and interactive map updating.OpenGraph is a novel framework designed for open-vocabulary hierarchical 3D graph representation in large-scale outdoor environments. It addresses the limitations of existing open-vocabulary maps, which are primarily designed for small-scale environments and lack robust reasoning capabilities and efficient map structures. OpenGraph leverages Visual-Language Models (VLMs) and Large Language Models (LLMs) to enhance object comprehension and reasoning. The framework consists of three main modules: Caption-Enhanced Object Comprehension, Object-Centric Map Construction, and Hierarchical Graph Representation Formation. It extracts instance masks and captions from 2D images, projects these onto 3D LiDAR point clouds, and constructs a hierarchical graph based on lane graph connectivity. Validation on the SemanticKITTI dataset demonstrates that OpenGraph achieves superior segmentation and query accuracy, making it suitable for various downstream tasks such as zero-shot semantic segmentation, open-vocabulary object retrieval, structured topology query, global path planning, and interactive map updating.
Reach us at info@study.space