Understanding VoroNav%3A Voronoi-based Zero-shot Object Navigation with Large Language Model

VoroNav is a novel semantic exploration framework for Zero-Shot Object Navigation (ZSON) that leverages a Reduced Voronoi Graph (RVG) to extract exploratory paths and planning nodes from a real-time semantic map. By integrating topological and semantic information, VoroNav generates text-based descriptions of paths and images that are interpretable by large language models (LLMs). These descriptions, combined with commonsense reasoning, enable the LLM to determine waypoints for navigation. VoroNav outperforms existing benchmarks in success rate and exploration efficiency on HM3D and HSSD datasets, achieving +2.8% and +3.7% improvements on HM3D, and +2.6% and +3.8% on HSSD. The framework also introduces metrics to evaluate obstacle avoidance and perceptual efficiency, further validating its effectiveness. VoroNav's approach combines path and farsight descriptions to provide holistic scene information for LLM reasoning, leading to more informed decision-making. The system includes three modules: a semantic mapping module that constructs a semantic map, a global decision module that generates RVG and uses LLM for waypoint selection, and a local policy module that plans low-level actions. Experiments show that VoroNav achieves state-of-the-art results in ZSON, demonstrating improved navigation and exploration capabilities. The framework addresses limitations of traditional methods by using an innovative fusion of text and semantic information, enhancing the agent's ability to navigate unfamiliar environments. VoroNav sets a new benchmark for ZSON and opens new possibilities for intelligent robotic systems.VoroNav is a novel semantic exploration framework for Zero-Shot Object Navigation (ZSON) that leverages a Reduced Voronoi Graph (RVG) to extract exploratory paths and planning nodes from a real-time semantic map. By integrating topological and semantic information, VoroNav generates text-based descriptions of paths and images that are interpretable by large language models (LLMs). These descriptions, combined with commonsense reasoning, enable the LLM to determine waypoints for navigation. VoroNav outperforms existing benchmarks in success rate and exploration efficiency on HM3D and HSSD datasets, achieving +2.8% and +3.7% improvements on HM3D, and +2.6% and +3.8% on HSSD. The framework also introduces metrics to evaluate obstacle avoidance and perceptual efficiency, further validating its effectiveness. VoroNav's approach combines path and farsight descriptions to provide holistic scene information for LLM reasoning, leading to more informed decision-making. The system includes three modules: a semantic mapping module that constructs a semantic map, a global decision module that generates RVG and uses LLM for waypoint selection, and a local policy module that plans low-level actions. Experiments show that VoroNav achieves state-of-the-art results in ZSON, demonstrating improved navigation and exploration capabilities. The framework addresses limitations of traditional methods by using an innovative fusion of text and semantic information, enhancing the agent's ability to navigate unfamiliar environments. VoroNav sets a new benchmark for ZSON and opens new possibilities for intelligent robotic systems.

VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model

6 Feb 2024 | Pengying Wu, Yao Mu, Bingxian Wu, Yi Hou, Ji Ma, Shanghang Zhang, Chang Liu