Generalized Predictive Model for Autonomous Driving

Generalized Predictive Model for Autonomous Driving

8 Aug 2024 | Jiazhi Yang, Shenyuan Gao, Yihang Qiu, Li Chen, Tianyu Li, Bo Dai, Kashyap Chitta, Penghao Wu, Jia Zeng, Ping Luo, Jun Zhang, Andreas Geiger, Yu Qiao, Hongyang Li
This paper introduces GenAD, a generalized predictive model for autonomous driving, which is trained on a large-scale multimodal dataset called OpenDV-2K. The dataset contains over 2000 hours of driving videos from around the world, with diverse weather conditions and traffic scenarios. The model is designed to predict future driving scenarios based on past visual and textual inputs, and it is validated across various tasks, including zero-shot domain transfer, language-conditioned prediction, action-conditioned prediction, and motion planning. GenAD is built upon recent latent diffusion models and incorporates novel temporal reasoning blocks to handle the complex dynamics of driving scenes. It is capable of generalizing to various unseen driving datasets and can be adapted into an action-conditioned prediction model or a motion planner, making it suitable for real-world applications. The model is trained in two stages: first, transferring the generation distribution of a general text-to-image model to the driving domain, and second, training a video prediction model with the proposed temporal reasoning blocks. The dataset, OpenDV-2K, is constructed by collecting high-quality driving videos from YouTube and merging them with publicly licensed datasets. This results in a diverse and extensive dataset that covers a wide range of driving scenarios and sensor configurations. The dataset is used to train GenAD, which is then evaluated on various tasks, including video prediction, language-conditioned prediction, and action-conditioned prediction. The results show that GenAD outperforms existing models in terms of prediction quality and generalization ability. The paper also discusses the challenges of autonomous driving, including the limited generalization ability of learned models in structured autonomous driving systems. The proposed GenAD model addresses these challenges by leveraging a large-scale dataset and a novel predictive model that can generalize to new conditions and environments. The model is evaluated on various benchmarks and shows promising results in both simulation consistency and planning reliability. The paper concludes that GenAD has the potential to significantly advance autonomous driving technology by providing a scalable and generalizable predictive model.This paper introduces GenAD, a generalized predictive model for autonomous driving, which is trained on a large-scale multimodal dataset called OpenDV-2K. The dataset contains over 2000 hours of driving videos from around the world, with diverse weather conditions and traffic scenarios. The model is designed to predict future driving scenarios based on past visual and textual inputs, and it is validated across various tasks, including zero-shot domain transfer, language-conditioned prediction, action-conditioned prediction, and motion planning. GenAD is built upon recent latent diffusion models and incorporates novel temporal reasoning blocks to handle the complex dynamics of driving scenes. It is capable of generalizing to various unseen driving datasets and can be adapted into an action-conditioned prediction model or a motion planner, making it suitable for real-world applications. The model is trained in two stages: first, transferring the generation distribution of a general text-to-image model to the driving domain, and second, training a video prediction model with the proposed temporal reasoning blocks. The dataset, OpenDV-2K, is constructed by collecting high-quality driving videos from YouTube and merging them with publicly licensed datasets. This results in a diverse and extensive dataset that covers a wide range of driving scenarios and sensor configurations. The dataset is used to train GenAD, which is then evaluated on various tasks, including video prediction, language-conditioned prediction, and action-conditioned prediction. The results show that GenAD outperforms existing models in terms of prediction quality and generalization ability. The paper also discusses the challenges of autonomous driving, including the limited generalization ability of learned models in structured autonomous driving systems. The proposed GenAD model addresses these challenges by leveraging a large-scale dataset and a novel predictive model that can generalize to new conditions and environments. The model is evaluated on various benchmarks and shows promising results in both simulation consistency and planning reliability. The paper concludes that GenAD has the potential to significantly advance autonomous driving technology by providing a scalable and generalizable predictive model.
Reach us at info@study.space