[slides and audio] MapGPT%3A Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation

**MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation** **Authors:** Jiaqi Chen, Bingqian Lin, Ran Xu, Zhenhua Chai, Xiaodan Liang, Kwan-Yee K. Wong **Project:** <https://chen-judge.github.io/MapGPT/> **Abstract:** Embodied agents equipped with GPT have demonstrated exceptional decision-making and generalization abilities across various tasks. However, existing zero-shot agents for vision-and-language navigation (VLN) only prompt GPT-4 to select potential locations within localized environments, without constructing an effective "global-view" to understand the overall environment. This work introduces MapGPT, a novel map-guided GPT-based agent that encourages global exploration by building an online linguistic-formed map. The map includes node information and topological relationships, helping GPT understand the spatial environment. An adaptive planning mechanism is proposed to assist the agent in performing multi-step path planning based on the map, enabling systematic exploration of multiple candidate nodes or sub-goals. Extensive experiments show that MapGPT achieves state-of-the-art zero-shot performance on R2R and REVERIE datasets, demonstrating enhanced global thinking and path planning abilities of GPT. **Contributions:** - Introduce a map-guided prompting method to build an online linguistic-formed map, including node information and topological relationships. - Propose an adaptive planning mechanism to enable multi-step path planning based on the map. - Achieve state-of-the-art zero-shot performance on R2R and REVERIE datasets, showcasing improved global thinking and path planning capabilities. **Methods:** - **Single Expert Prompt System:** A unified prompt system that integrates task description, fundamental inputs (instruction, visual observation, action space, history), and map annotations. - **Map-Guided Prompting:** Convert topological relationships into textual prompts to help GPT understand the navigation environment. - **Adaptive Path Planning:** Enable multi-step path planning by dynamically generating and updating a plan based on the map and previous planning. **Experiments:** - **Datasets and Evaluation:** R2R and REVERIE datasets with detailed step-by-step and high-level instructions, respectively. - **Results:** MapGPT outperforms previous methods on both datasets, achieving significant improvements in success rate and Oracle success rate. **Conclusion:** MapGPT is a novel zero-shot agent for VLN tasks, leveraging map-guided prompting and adaptive path planning to achieve state-of-the-art performance. Future work could focus on real-world deployment and addressing real-world challenges.**MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation** **Authors:** Jiaqi Chen, Bingqian Lin, Ran Xu, Zhenhua Chai, Xiaodan Liang, Kwan-Yee K. Wong **Project:** <https://chen-judge.github.io/MapGPT/> **Abstract:** Embodied agents equipped with GPT have demonstrated exceptional decision-making and generalization abilities across various tasks. However, existing zero-shot agents for vision-and-language navigation (VLN) only prompt GPT-4 to select potential locations within localized environments, without constructing an effective "global-view" to understand the overall environment. This work introduces MapGPT, a novel map-guided GPT-based agent that encourages global exploration by building an online linguistic-formed map. The map includes node information and topological relationships, helping GPT understand the spatial environment. An adaptive planning mechanism is proposed to assist the agent in performing multi-step path planning based on the map, enabling systematic exploration of multiple candidate nodes or sub-goals. Extensive experiments show that MapGPT achieves state-of-the-art zero-shot performance on R2R and REVERIE datasets, demonstrating enhanced global thinking and path planning abilities of GPT. **Contributions:** - Introduce a map-guided prompting method to build an online linguistic-formed map, including node information and topological relationships. - Propose an adaptive planning mechanism to enable multi-step path planning based on the map. - Achieve state-of-the-art zero-shot performance on R2R and REVERIE datasets, showcasing improved global thinking and path planning capabilities. **Methods:** - **Single Expert Prompt System:** A unified prompt system that integrates task description, fundamental inputs (instruction, visual observation, action space, history), and map annotations. - **Map-Guided Prompting:** Convert topological relationships into textual prompts to help GPT understand the navigation environment. - **Adaptive Path Planning:** Enable multi-step path planning by dynamically generating and updating a plan based on the map and previous planning. **Experiments:** - **Datasets and Evaluation:** R2R and REVERIE datasets with detailed step-by-step and high-level instructions, respectively. - **Results:** MapGPT outperforms previous methods on both datasets, achieving significant improvements in success rate and Oracle success rate. **Conclusion:** MapGPT is a novel zero-shot agent for VLN tasks, leveraging map-guided prompting and adaptive path planning to achieve state-of-the-art performance. Future work could focus on real-world deployment and addressing real-world challenges.

MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation

20 Jun 2024 | Jiaqi Chen, Bingqian Lin, Ran Xu, Zhenhua Chai, Xiaodan Liang, Kwan-Yee K. Wong