Understanding PlanAgent%3A A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning

**PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning** **Abstract:** Vehicle motion planning is a critical component of autonomous driving technology. Current rule-based methods perform well in common scenarios but struggle with long-tailed situations, while learning-based methods often fail to outperform rule-based approaches in large-scale closed-loop scenarios. To address these issues, PlanAgent is introduced, the first mid-to-mid planning system based on a Multi-modal Large Language Model (MLLM). MLLM is used to introduce human-like knowledge, interpretability, and common-sense reasoning into closed-loop planning. PlanAgent leverages three core modules: an Environment Transformation module that constructs a Bird's Eye View (BEV) map and a lane-graph-based textual description; a Reasoning Engine module that introduces a hierarchical chain-of-thought from scene understanding to lateral and longitudinal motion instructions, culminating in planner code generation; and a Reflection module that simulates and evaluates the generated planner to reduce MLLM's uncertainty. PlanAgent is evaluated on the nuPlan benchmarks, demonstrating superior performance in both common and challenging long-tailed scenarios. **Contributions:** - Introduction of PlanAgent, the first mid-to-mid planning agent based on an MLLM. - Development of an Environment Transformation module for efficient scene information representation. - Design of a Reasoning Engine module for common-sense reasoning and safety planning. - Evaluation on nuPlan benchmarks showing competitive and generalizable performance. **Keywords:** Multi-modal Language Model, Language Agent, Autonomous Driving, Closed-loop Motion Planning.**PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning** **Abstract:** Vehicle motion planning is a critical component of autonomous driving technology. Current rule-based methods perform well in common scenarios but struggle with long-tailed situations, while learning-based methods often fail to outperform rule-based approaches in large-scale closed-loop scenarios. To address these issues, PlanAgent is introduced, the first mid-to-mid planning system based on a Multi-modal Large Language Model (MLLM). MLLM is used to introduce human-like knowledge, interpretability, and common-sense reasoning into closed-loop planning. PlanAgent leverages three core modules: an Environment Transformation module that constructs a Bird's Eye View (BEV) map and a lane-graph-based textual description; a Reasoning Engine module that introduces a hierarchical chain-of-thought from scene understanding to lateral and longitudinal motion instructions, culminating in planner code generation; and a Reflection module that simulates and evaluates the generated planner to reduce MLLM's uncertainty. PlanAgent is evaluated on the nuPlan benchmarks, demonstrating superior performance in both common and challenging long-tailed scenarios. **Contributions:** - Introduction of PlanAgent, the first mid-to-mid planning agent based on an MLLM. - Development of an Environment Transformation module for efficient scene information representation. - Design of a Reasoning Engine module for common-sense reasoning and safety planning. - Evaluation on nuPlan benchmarks showing competitive and generalizable performance. **Keywords:** Multi-modal Language Model, Language Agent, Autonomous Driving, Closed-loop Motion Planning.

PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning

4 Jun 2024 | Yupeng Zheng*, Zebin Xing*, Qichao Zhang**, Bu Jin, Pengfei Li, Yuhang Zheng, Zhongpu Xia, Kun Zhan, Xianpeng Lang, Yaran Chen, Dongbin Zhao Fellow, IEEE

4 Jun 2024 | Yupeng Zheng, Zebin Xing, Qichao Zhang**, Bu Jin, Pengfei Li, Yuhang Zheng, Zhongpu Xia, Kun Zhan, Xianpeng Lang, Yaran Chen, Dongbin Zhao Fellow, IEEE