VLP: Vision Language Planning for Autonomous Driving

VLP: Vision Language Planning for Autonomous Driving

9 Mar 2024 | Chenbin Pan, Burhaneddin Yaman, Tommaso Nesti, Abhirup Mallik, Alessandro G Allievi, Senem Velipasalar, Liu Ren
The paper introduces VLP (Vision-Language-Planning), a novel framework that integrates language models to enhance autonomous driving systems (ADS). VLP aims to bridge the gap between linguistic understanding and autonomous driving by leveraging the common-sense capabilities of large language models (LLMs). The framework consists of two key components: the Agent-centric Learning Paradigm (ALP) and the Self-driving-car-centric Learning Paradigm (SLP). ALP refines local details in the bird's-eye view (BEV) feature map, while SLP guides the planning process by aligning the ego-vehicle query with intended goals and driving status. VLP demonstrates state-of-the-art performance on the NuScenes dataset, achieving a 35.9% reduction in average L2 error and a 60.5% reduction in collision rates compared to previous methods. Additionally, VLP shows strong generalization capabilities in new urban environments and long-tail scenarios. The paper also includes ablation studies and experimental results to validate the effectiveness of VLP in various driving tasks, such as multi-object tracking, mapping, motion forecasting, occupancy prediction, and 3D object detection.The paper introduces VLP (Vision-Language-Planning), a novel framework that integrates language models to enhance autonomous driving systems (ADS). VLP aims to bridge the gap between linguistic understanding and autonomous driving by leveraging the common-sense capabilities of large language models (LLMs). The framework consists of two key components: the Agent-centric Learning Paradigm (ALP) and the Self-driving-car-centric Learning Paradigm (SLP). ALP refines local details in the bird's-eye view (BEV) feature map, while SLP guides the planning process by aligning the ego-vehicle query with intended goals and driving status. VLP demonstrates state-of-the-art performance on the NuScenes dataset, achieving a 35.9% reduction in average L2 error and a 60.5% reduction in collision rates compared to previous methods. Additionally, VLP shows strong generalization capabilities in new urban environments and long-tail scenarios. The paper also includes ablation studies and experimental results to validate the effectiveness of VLP in various driving tasks, such as multi-object tracking, mapping, motion forecasting, occupancy prediction, and 3D object detection.
Reach us at info@study.space
Understanding VLP%3A Vision Language Planning for Autonomous Driving