PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning

PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning

4 Jun 2024 | Yupeng Zheng*, Zebin Xing*, Qichao Zhang**, Bu Jin, Pengfei Li, Yuhang Zheng, Zhongpu Xia, Kun Zhan, Xianpeng Lang, Yaran Chen, Dongbin Zhao Fellow, IEEE
PlanAgent is a novel multi-modal large language model (MLLM)-based autonomous driving planning system designed for closed-loop vehicle motion planning. It addresses the limitations of existing rule-based and learning-based methods by leveraging MLLM's capabilities in common-sense reasoning and generalization. The system consists of three core modules: Environment Transformation, Reasoning Engine, and Reflection. The Environment Transformation module constructs a Bird's Eye View (BEV) map and lane-graph-based textual description from the environment. The Reasoning Engine generates planner code through hierarchical chain-of-thought reasoning, while the Reflection module simulates and evaluates the generated planner to reduce MLLM uncertainty. PlanAgent outperforms existing state-of-the-art methods on the nuPlan benchmarks, achieving superior performance in both common and long-tailed scenarios. It is evaluated on the nuPlan Val14 and Test14-hard benchmarks, demonstrating competitive and generalizable performance. The system's efficiency is further enhanced by reducing token usage for scene description. Ablation studies show that the Environment Transformation and Reasoning Engine modules significantly improve performance. Qualitative results demonstrate PlanAgent's ability to handle complex driving scenarios. The system is also tested with various MLLMs, showing compatibility and effectiveness. Despite its advantages, PlanAgent has limitations, including sensitivity to prompt quality and computational burden. Future work focuses on improving MLLM understanding, optimizing MLLM call frequency, and utilizing closed-loop simulators for better alignment with human-like decision-making. PlanAgent provides a robust solution for autonomous driving planning, enhancing safety and generalization in complex scenarios.PlanAgent is a novel multi-modal large language model (MLLM)-based autonomous driving planning system designed for closed-loop vehicle motion planning. It addresses the limitations of existing rule-based and learning-based methods by leveraging MLLM's capabilities in common-sense reasoning and generalization. The system consists of three core modules: Environment Transformation, Reasoning Engine, and Reflection. The Environment Transformation module constructs a Bird's Eye View (BEV) map and lane-graph-based textual description from the environment. The Reasoning Engine generates planner code through hierarchical chain-of-thought reasoning, while the Reflection module simulates and evaluates the generated planner to reduce MLLM uncertainty. PlanAgent outperforms existing state-of-the-art methods on the nuPlan benchmarks, achieving superior performance in both common and long-tailed scenarios. It is evaluated on the nuPlan Val14 and Test14-hard benchmarks, demonstrating competitive and generalizable performance. The system's efficiency is further enhanced by reducing token usage for scene description. Ablation studies show that the Environment Transformation and Reasoning Engine modules significantly improve performance. Qualitative results demonstrate PlanAgent's ability to handle complex driving scenarios. The system is also tested with various MLLMs, showing compatibility and effectiveness. Despite its advantages, PlanAgent has limitations, including sensitivity to prompt quality and computational burden. Future work focuses on improving MLLM understanding, optimizing MLLM call frequency, and utilizing closed-loop simulators for better alignment with human-like decision-making. PlanAgent provides a robust solution for autonomous driving planning, enhancing safety and generalization in complex scenarios.
Reach us at info@study.space
[slides] PlanAgent%3A A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning | StudySpace