20 Jun 2024 | Jiaqi Chen, Bingqian Lin, Ran Xu, Zhenhua Chai, Xiaodan Liang, Kwan-Yee K. Wong
MapGPT is a novel map-guided prompting method for vision-and-language navigation (VLN) that enables GPT-based agents to perform global exploration and adaptive path planning. The method introduces an online linguistic-formed map that includes node information and topological relationships to encourage GPT's global exploration. It also proposes an adaptive planning mechanism to assist the agent in performing multi-step path planning based on a map, systematically exploring multiple candidate nodes or sub-goals step by step. Extensive experiments demonstrate that MapGPT achieves state-of-the-art zero-shot performance on R2R and REVERIE, with improvements in success rate (SR) of up to 10% and 12%, respectively. The method is applicable to both GPT-4 and GPT-4V, and it can adapt to varying instruction styles, making it more unified and effective. The key contributions include the proposal of a novel map-guided prompting method, an adaptive planning mechanism, and the application of MapGPT to both GPT-4 and GPT-4V. The method addresses the limitations of previous zero-shot VLN agents by incorporating a map-guided prompting approach that encourages global exploration and adaptive path planning. The experiments show that MapGPT outperforms existing zero-shot VLN agents, particularly in REVERIE, where it achieves a higher SR than some learning-based methods. The method also demonstrates the newly emergent global thinking and path planning abilities of GPT.MapGPT is a novel map-guided prompting method for vision-and-language navigation (VLN) that enables GPT-based agents to perform global exploration and adaptive path planning. The method introduces an online linguistic-formed map that includes node information and topological relationships to encourage GPT's global exploration. It also proposes an adaptive planning mechanism to assist the agent in performing multi-step path planning based on a map, systematically exploring multiple candidate nodes or sub-goals step by step. Extensive experiments demonstrate that MapGPT achieves state-of-the-art zero-shot performance on R2R and REVERIE, with improvements in success rate (SR) of up to 10% and 12%, respectively. The method is applicable to both GPT-4 and GPT-4V, and it can adapt to varying instruction styles, making it more unified and effective. The key contributions include the proposal of a novel map-guided prompting method, an adaptive planning mechanism, and the application of MapGPT to both GPT-4 and GPT-4V. The method addresses the limitations of previous zero-shot VLN agents by incorporating a map-guided prompting approach that encourages global exploration and adaptive path planning. The experiments show that MapGPT outperforms existing zero-shot VLN agents, particularly in REVERIE, where it achieves a higher SR than some learning-based methods. The method also demonstrates the newly emergent global thinking and path planning abilities of GPT.