4 Jun 2024 | Yupeng Zheng*, Zebin Xing*, Qichao Zhang**, Bu Jin, Pengfei Li, Yuhang Zheng, Zhongpu Xia, Kun Zhan, Xianpeng Lang, Yaran Chen, Dongbin Zhao Fellow, IEEE
**PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning**
**Abstract:**
Vehicle motion planning is a critical component of autonomous driving technology. Current rule-based methods perform well in common scenarios but struggle with long-tailed situations, while learning-based methods often fail to outperform rule-based approaches in large-scale closed-loop scenarios. To address these issues, PlanAgent is introduced, the first mid-to-mid planning system based on a Multi-modal Large Language Model (MLLM). MLLM is used to introduce human-like knowledge, interpretability, and common-sense reasoning into closed-loop planning. PlanAgent leverages three core modules: an Environment Transformation module that constructs a Bird's Eye View (BEV) map and a lane-graph-based textual description; a Reasoning Engine module that introduces a hierarchical chain-of-thought from scene understanding to lateral and longitudinal motion instructions, culminating in planner code generation; and a Reflection module that simulates and evaluates the generated planner to reduce MLLM's uncertainty. PlanAgent is evaluated on the nuPlan benchmarks, demonstrating superior performance in both common and challenging long-tailed scenarios.
**Contributions:**
- Introduction of PlanAgent, the first mid-to-mid planning agent based on an MLLM.
- Development of an Environment Transformation module for efficient scene information representation.
- Design of a Reasoning Engine module for common-sense reasoning and safety planning.
- Evaluation on nuPlan benchmarks showing competitive and generalizable performance.
**Keywords:**
Multi-modal Language Model, Language Agent, Autonomous Driving, Closed-loop Motion Planning.**PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning**
**Abstract:**
Vehicle motion planning is a critical component of autonomous driving technology. Current rule-based methods perform well in common scenarios but struggle with long-tailed situations, while learning-based methods often fail to outperform rule-based approaches in large-scale closed-loop scenarios. To address these issues, PlanAgent is introduced, the first mid-to-mid planning system based on a Multi-modal Large Language Model (MLLM). MLLM is used to introduce human-like knowledge, interpretability, and common-sense reasoning into closed-loop planning. PlanAgent leverages three core modules: an Environment Transformation module that constructs a Bird's Eye View (BEV) map and a lane-graph-based textual description; a Reasoning Engine module that introduces a hierarchical chain-of-thought from scene understanding to lateral and longitudinal motion instructions, culminating in planner code generation; and a Reflection module that simulates and evaluates the generated planner to reduce MLLM's uncertainty. PlanAgent is evaluated on the nuPlan benchmarks, demonstrating superior performance in both common and challenging long-tailed scenarios.
**Contributions:**
- Introduction of PlanAgent, the first mid-to-mid planning agent based on an MLLM.
- Development of an Environment Transformation module for efficient scene information representation.
- Design of a Reasoning Engine module for common-sense reasoning and safety planning.
- Evaluation on nuPlan benchmarks showing competitive and generalizable performance.
**Keywords:**
Multi-modal Language Model, Language Agent, Autonomous Driving, Closed-loop Motion Planning.