GaussNav: Gaussian Splatting for Visual Navigation

GaussNav: Gaussian Splatting for Visual Navigation

20 Mar 2024 | Xiaohan Lei, Min Wang, Wengang Zhou, Houqiang Li
**GaussNav: Gaussian Splatting for Visual Navigation** **Authors:** Xiaohan Lei, Min Wang, Wengang Zhou, Houqiang Li **Institution:** University of Science and Technology of China, Hefei Comprehensive National Science Center **Project Page:** https://xiaohanlei.github.io/projects/GaussNav/ **Abstract:** Instance ImageGoal Navigation (IIN) is a challenging task in embodied vision, requiring agents to locate a specific object within an unexplored environment. Existing map-based navigation methods often use Bird's Eye View (BEV) maps, which lack detailed scene textures. To address this, the authors propose GaussNav, a novel framework that uses 3D Gaussian Splatting (3DGS) to construct a map representation that retains both geometric and semantic information, as well as textural features. GaussNav consists of three stages: Sub-gaussians Division, Semantic Gaussian Construction, and Gaussian Navigation. This framework significantly improves performance, achieving a Success weighted by Path Length (SPL) of 0.578 on the Habitat-Matterport 3D (HM3D) dataset, a substantial improvement over previous methods. **Keywords:** Embodied Visual Navigation, 3D Gaussian Splatting **Introduction:** Embodied Artificial Intelligence (EAI) aims to enable agents to explore, learn, reason, and interact in the physical world. The IIN task involves navigating to a specific object instance depicted in a goal image, distinguishing it from other visually similar instances. Traditional methods focus on constructing explicit episodic BEV maps, which lack 3D geometrical and textural information. GaussNav addresses these limitations by using 3DGS to create a Semantic Gaussian map that retains scene geometry, semantic labels, and texture details. **Methods:** 1. **Sub-gaussians Division:** Divides observations into subsets for constructing sub-gaussians. 2. **Semantic Gaussian Construction:** Assigns semantic labels to each gaussian and clusters them based on semantic labels and 3D positions. 3. **Gaussian Navigation:** Classifies the goal image, generates descriptive images, matches them with the goal, and uses path planning to navigate to the target object. **Experiments:** - **Dataset:** HM3D - **Evaluation Metrics:** Success Rate (Success) and SPL - **Results:** GaussNav outperforms baselines and previous state-of-the-art methods, achieving a significant improvement in SPL. **Conclusion:** GaussNav introduces a novel map representation that enhances the ability of agents to navigate in complex environments by retaining detailed scene information. The framework's effectiveness is demonstrated through experiments on the HM3D dataset, showing a substantial improvement in performance.**GaussNav: Gaussian Splatting for Visual Navigation** **Authors:** Xiaohan Lei, Min Wang, Wengang Zhou, Houqiang Li **Institution:** University of Science and Technology of China, Hefei Comprehensive National Science Center **Project Page:** https://xiaohanlei.github.io/projects/GaussNav/ **Abstract:** Instance ImageGoal Navigation (IIN) is a challenging task in embodied vision, requiring agents to locate a specific object within an unexplored environment. Existing map-based navigation methods often use Bird's Eye View (BEV) maps, which lack detailed scene textures. To address this, the authors propose GaussNav, a novel framework that uses 3D Gaussian Splatting (3DGS) to construct a map representation that retains both geometric and semantic information, as well as textural features. GaussNav consists of three stages: Sub-gaussians Division, Semantic Gaussian Construction, and Gaussian Navigation. This framework significantly improves performance, achieving a Success weighted by Path Length (SPL) of 0.578 on the Habitat-Matterport 3D (HM3D) dataset, a substantial improvement over previous methods. **Keywords:** Embodied Visual Navigation, 3D Gaussian Splatting **Introduction:** Embodied Artificial Intelligence (EAI) aims to enable agents to explore, learn, reason, and interact in the physical world. The IIN task involves navigating to a specific object instance depicted in a goal image, distinguishing it from other visually similar instances. Traditional methods focus on constructing explicit episodic BEV maps, which lack 3D geometrical and textural information. GaussNav addresses these limitations by using 3DGS to create a Semantic Gaussian map that retains scene geometry, semantic labels, and texture details. **Methods:** 1. **Sub-gaussians Division:** Divides observations into subsets for constructing sub-gaussians. 2. **Semantic Gaussian Construction:** Assigns semantic labels to each gaussian and clusters them based on semantic labels and 3D positions. 3. **Gaussian Navigation:** Classifies the goal image, generates descriptive images, matches them with the goal, and uses path planning to navigate to the target object. **Experiments:** - **Dataset:** HM3D - **Evaluation Metrics:** Success Rate (Success) and SPL - **Results:** GaussNav outperforms baselines and previous state-of-the-art methods, achieving a significant improvement in SPL. **Conclusion:** GaussNav introduces a novel map representation that enhances the ability of agents to navigate in complex environments by retaining detailed scene information. The framework's effectiveness is demonstrated through experiments on the HM3D dataset, showing a substantial improvement in performance.
Reach us at info@study.space
[slides and audio] GaussNav%3A Gaussian Splatting for Visual Navigation