20 Mar 2024 | Xiaohan Lei¹, Min Wang², Wengang Zhou¹,², and Houqiang Li¹,²
GaussNav is a novel framework for Instance ImageGoal Navigation (IIN) that uses 3D Gaussian Splatting (3DGS) to create a new map representation. This approach enables agents to navigate to specific object instances in an unexplored environment by preserving both the 3D geometry and texture details of the scene. The framework consists of three stages: Sub-gaussians Division, Semantic Gaussian Construction, and Gaussian Navigation. In the first stage, the agent explores the environment and divides observations into sub-gaussians. In the second stage, semantic labels are assigned to each gaussian, allowing for the reconstruction of objects in the scene. In the third stage, descriptive images are generated to match the goal image and locate the target object. The framework transforms the IIN task into a more manageable PointGoal Navigation task, significantly improving performance on the Habitat-Matterport 3D (HM3D) dataset, achieving a Success of 0.725 and SPL of 0.578. This represents a significant improvement over previous state-of-the-art methods. The framework also includes ablation studies showing the importance of the classifier and matching modules in achieving high performance. The method is efficient, reducing the search space by leveraging semantic labels and minimizing the need for extensive exploration. Overall, GaussNav demonstrates a significant advancement in embodied visual navigation by effectively combining 3D Gaussian Splatting with semantic information to enable accurate and efficient navigation.GaussNav is a novel framework for Instance ImageGoal Navigation (IIN) that uses 3D Gaussian Splatting (3DGS) to create a new map representation. This approach enables agents to navigate to specific object instances in an unexplored environment by preserving both the 3D geometry and texture details of the scene. The framework consists of three stages: Sub-gaussians Division, Semantic Gaussian Construction, and Gaussian Navigation. In the first stage, the agent explores the environment and divides observations into sub-gaussians. In the second stage, semantic labels are assigned to each gaussian, allowing for the reconstruction of objects in the scene. In the third stage, descriptive images are generated to match the goal image and locate the target object. The framework transforms the IIN task into a more manageable PointGoal Navigation task, significantly improving performance on the Habitat-Matterport 3D (HM3D) dataset, achieving a Success of 0.725 and SPL of 0.578. This represents a significant improvement over previous state-of-the-art methods. The framework also includes ablation studies showing the importance of the classifier and matching modules in achieving high performance. The method is efficient, reducing the search space by leveraging semantic labels and minimizing the need for extensive exploration. Overall, GaussNav demonstrates a significant advancement in embodied visual navigation by effectively combining 3D Gaussian Splatting with semantic information to enable accurate and efficient navigation.