Generalized Predictive Model for Autonomous Driving

Generalized Predictive Model for Autonomous Driving

8 Aug 2024 | Jiazhi Yang, Shenyuan Gao, Yihang Qiu, Li Chen, Tianyu Li, Bo Dai, Kashyap Chitta, Penghao Wu, Jia Zeng, Ping Luo, Jun Zhang, Andreas Geiger, Yu Qiao, Hongyang Li
The paper introduces GenAD, a large-scale video prediction model for autonomous driving, which aims to establish a generalized video prediction paradigm. The model is trained on the OpenDV-2K dataset, the largest multimodal driving video dataset to date, containing over 2000 hours of driving videos from various regions and conditions. GenAD is designed to handle the dynamic nature of driving scenes and can generalize to unseen datasets in zero-shot manner, outperforming existing video prediction models. The model's strong generalization and controllability are validated through various tasks, including zero-shot domain transfer, language-conditioned prediction, action-conditioned prediction, and motion planning. The paper also discusses the challenges and limitations of the current designs, suggesting future directions for improvement, such as using more advanced vision-language models and efficient sampling methods.The paper introduces GenAD, a large-scale video prediction model for autonomous driving, which aims to establish a generalized video prediction paradigm. The model is trained on the OpenDV-2K dataset, the largest multimodal driving video dataset to date, containing over 2000 hours of driving videos from various regions and conditions. GenAD is designed to handle the dynamic nature of driving scenes and can generalize to unseen datasets in zero-shot manner, outperforming existing video prediction models. The model's strong generalization and controllability are validated through various tasks, including zero-shot domain transfer, language-conditioned prediction, action-conditioned prediction, and motion planning. The paper also discusses the challenges and limitations of the current designs, suggesting future directions for improvement, such as using more advanced vision-language models and efficient sampling methods.
Reach us at info@study.space