9 Mar 2024 | Chenbin Pan, Burhaneddin Yaman, Tommaso Nesti, Abhirup Mallik, Alessandro G Allievi, Senem Velipasalar, Liu Ren
VLP: Vision Language Planning for Autonomous Driving
This paper presents VLP, a novel Vision-Language-Planning framework that integrates language models to bridge the gap between linguistic understanding and autonomous driving. VLP enhances autonomous driving systems by strengthening both the source memory foundation and the self-driving car's contextual understanding. VLP achieves state-of-the-art end-to-end planning performance on the challenging NuScenes dataset by achieving 35.9% and 60.5% reduction in terms of average L2 error and collision rates, respectively, compared to the previous best method. Moreover, VLP shows improved performance in challenging long-tail scenarios and strong generalization capabilities when faced with new urban environments.
VLP consists of two key components: Agent-centric Learning Paradigm (ALP) and Self-driving-car-centric Learning Paradigm (SLP), leveraging LLMs to enhance the ADS from reasoning and decision-making aspects, respectively. The BEV feature map serves as the source memory pool in ADS for downstream decoding tasks. It summarizes and encodes the driving environment surrounding the self-driving car, including vehicles, pedestrians, lanes, and more, into a unified feature map. Hence, capturing comprehensive and necessary details in each local position of BEV is critical for safe and precise self-driving performance.
To enhance the local semantic representation and reasoning capabilities of BEV, we introduce an innovative Agent-centric Learning Paradigm (ALP) module. ALP integrates the consistent feature space of a pretrained language model to revamp the agent features on the BEV, actively shaping semantics, and guiding the BEV reasoning process. Leveraging on the common sense and logic flow embedded in the language model, our ALP equips the ADS with robustness and consistent BEV feature space, enhancing its effectiveness in diverse driving scenarios.
In ADS, the planning module aggregates information from the preceding perception and prediction phases to make the final decisions for self-driving. This global perspective culminates in the formation of a planning query, directly influencing the safety and accuracy of the self-driving navigation. Considering the critical role of the planning module within ADS, we also present a novel Self-driving-car-centric Learning Paradigm (SLP) to elevate the decoding and acquiring information ability of the planning query. In the SLP, we align the planning query with intended goals and the ego-vehicle driving status by leveraging the knowledge encoded in the pretrained language model. The language model's comprehension capabilities contribute to more informed decision-making during the planning phase as well as enabling a more robust planning query formation process.
Through VLP, we bridge the gap between human-like reasoning and autonomous driving, enhancing the model's contextual awareness and its ability to generalize effectively in complex, ever-changing real-world scenarios. The main contributions of this work are summarized as follows:
• We propose VLP, a Vision Language Planning model, which incorporates reasoning capability of LLMs into vision-based autonomous driving systems as an enhancement of motion planning and self-driving safety.
-VLP: Vision Language Planning for Autonomous Driving
This paper presents VLP, a novel Vision-Language-Planning framework that integrates language models to bridge the gap between linguistic understanding and autonomous driving. VLP enhances autonomous driving systems by strengthening both the source memory foundation and the self-driving car's contextual understanding. VLP achieves state-of-the-art end-to-end planning performance on the challenging NuScenes dataset by achieving 35.9% and 60.5% reduction in terms of average L2 error and collision rates, respectively, compared to the previous best method. Moreover, VLP shows improved performance in challenging long-tail scenarios and strong generalization capabilities when faced with new urban environments.
VLP consists of two key components: Agent-centric Learning Paradigm (ALP) and Self-driving-car-centric Learning Paradigm (SLP), leveraging LLMs to enhance the ADS from reasoning and decision-making aspects, respectively. The BEV feature map serves as the source memory pool in ADS for downstream decoding tasks. It summarizes and encodes the driving environment surrounding the self-driving car, including vehicles, pedestrians, lanes, and more, into a unified feature map. Hence, capturing comprehensive and necessary details in each local position of BEV is critical for safe and precise self-driving performance.
To enhance the local semantic representation and reasoning capabilities of BEV, we introduce an innovative Agent-centric Learning Paradigm (ALP) module. ALP integrates the consistent feature space of a pretrained language model to revamp the agent features on the BEV, actively shaping semantics, and guiding the BEV reasoning process. Leveraging on the common sense and logic flow embedded in the language model, our ALP equips the ADS with robustness and consistent BEV feature space, enhancing its effectiveness in diverse driving scenarios.
In ADS, the planning module aggregates information from the preceding perception and prediction phases to make the final decisions for self-driving. This global perspective culminates in the formation of a planning query, directly influencing the safety and accuracy of the self-driving navigation. Considering the critical role of the planning module within ADS, we also present a novel Self-driving-car-centric Learning Paradigm (SLP) to elevate the decoding and acquiring information ability of the planning query. In the SLP, we align the planning query with intended goals and the ego-vehicle driving status by leveraging the knowledge encoded in the pretrained language model. The language model's comprehension capabilities contribute to more informed decision-making during the planning phase as well as enabling a more robust planning query formation process.
Through VLP, we bridge the gap between human-like reasoning and autonomous driving, enhancing the model's contextual awareness and its ability to generalize effectively in complex, ever-changing real-world scenarios. The main contributions of this work are summarized as follows:
• We propose VLP, a Vision Language Planning model, which incorporates reasoning capability of LLMs into vision-based autonomous driving systems as an enhancement of motion planning and self-driving safety.
-