DRIVEVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

DRIVEVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

25 Jun 2024 | Xiaoyu Tian*, Junru Gu*, Bailin Li*, Yicheng Liu*, Yang Wang, Zhiyong Zhao, Kun Zhan, Peng Jia, Xianpeng Lang, Hang Zhao*
DriveVLM is an autonomous driving system that leverages Vision-Language Models (VLMs) to enhance scene understanding and planning. It integrates reasoning modules for scene description, scene analysis, and hierarchical planning. DriveVLM-Dual is a hybrid system that combines VLMs with traditional autonomous driving pipelines to improve spatial reasoning and real-time planning. Experiments on the nuScenes and SUP-AD datasets show that DriveVLM and DriveVLM-Dual effectively handle complex and unpredictable driving scenarios. DriveVLM-Dual was deployed on a production vehicle, confirming its effectiveness in real-world autonomous driving environments. The paper introduces a new task, Scene Understanding for Planning, along with evaluation metrics and a dataset. It also presents a comprehensive data mining and annotation pipeline for the SUP-AD dataset. The results demonstrate that DriveVLM outperforms other models in few-shot scenarios and that DriveVLM-Dual surpasses state-of-the-art end-to-end motion planning methods. The paper also discusses the design and implementation of DriveVLM and DriveVLM-Dual, including their ability to handle long-tail objects and challenging scenarios. The system was tested on various autonomous driving scenarios, showing its effectiveness in real-world applications. The paper concludes that DriveVLM and DriveVLM-Dual significantly advance autonomous driving by combining VLMs with traditional methods.DriveVLM is an autonomous driving system that leverages Vision-Language Models (VLMs) to enhance scene understanding and planning. It integrates reasoning modules for scene description, scene analysis, and hierarchical planning. DriveVLM-Dual is a hybrid system that combines VLMs with traditional autonomous driving pipelines to improve spatial reasoning and real-time planning. Experiments on the nuScenes and SUP-AD datasets show that DriveVLM and DriveVLM-Dual effectively handle complex and unpredictable driving scenarios. DriveVLM-Dual was deployed on a production vehicle, confirming its effectiveness in real-world autonomous driving environments. The paper introduces a new task, Scene Understanding for Planning, along with evaluation metrics and a dataset. It also presents a comprehensive data mining and annotation pipeline for the SUP-AD dataset. The results demonstrate that DriveVLM outperforms other models in few-shot scenarios and that DriveVLM-Dual surpasses state-of-the-art end-to-end motion planning methods. The paper also discusses the design and implementation of DriveVLM and DriveVLM-Dual, including their ability to handle long-tail objects and challenging scenarios. The system was tested on various autonomous driving scenarios, showing its effectiveness in real-world applications. The paper concludes that DriveVLM and DriveVLM-Dual significantly advance autonomous driving by combining VLMs with traditional methods.
Reach us at info@study.space