19 Jun 2024 | Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Zuxuan Wu, Yu-Gang Jiang, Jose M. Alvarez
**Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation**
**Authors:** Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Zuxuan Wu, Yu-Gang Jiang, Jose M. Alvarez
**Abstract:**
Hydra-MDP is a novel paradigm that employs multiple teachers in a teacher-student model to train a student model for end-to-end autonomous driving planning. The approach uses knowledge distillation from both human and rule-based teachers to train the student model, which features a multi-head decoder to learn diverse trajectory candidates tailored to various evaluation metrics. By leveraging rule-based teachers, Hydra-MDP learns how the environment influences planning in an end-to-end manner, avoiding non-differentiable post-processing. This method achieved 1st place in the Navsim challenge, demonstrating significant improvements in generalization across diverse driving environments and conditions.
**Introduction:**
End-to-end autonomous driving planning, which involves learning a neural planner with raw sensor inputs, is a promising direction for achieving full autonomy. However, recent studies have exposed vulnerabilities and limitations of imitation learning (IL) methods, particularly in open-loop evaluation. To address this, Hydra-MDP proposes a multi-target and multimodal planning approach, where the student model learns from both rule-based planners and human drivers using a multi-head decoder. This method integrates knowledge from specialized teachers and can be easily scaled by involving more cost functions or leveraging imitation similarity.
**Solution:**
Hydra-MDP consists of a Perception Network and a Trajectory Decoder. The Perception Network, based on the Transfuser model, extracts semantic information from images and LiDAR point clouds. The Trajectory Decoder uses a fixed planning vocabulary to discretize the continuous action space and incorporates environmental clues through transformer decoders. Multi-target Hydra-Distillation aligns the planner with simulation-based metrics by running offline simulations and introducing supervision from simulation scores during training.
**Experiments:**
Hydra-MDP was evaluated on the Navsim dataset, which focuses on scenarios involving changes in intention. The model achieved state-of-the-art performance under simulation-based evaluation metrics, outperforming baselines and demonstrating the effectiveness of multi-target learning and model ensembling. The method's scalability was also demonstrated with larger backbones, showing minor improvements in planning performance.
**Contributions:**
1. A universal framework for end-to-end multimodal planning via multi-target hydra-distillation.
2. State-of-the-art performance under simulation-based evaluation metrics on Navsim.**Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation**
**Authors:** Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Zuxuan Wu, Yu-Gang Jiang, Jose M. Alvarez
**Abstract:**
Hydra-MDP is a novel paradigm that employs multiple teachers in a teacher-student model to train a student model for end-to-end autonomous driving planning. The approach uses knowledge distillation from both human and rule-based teachers to train the student model, which features a multi-head decoder to learn diverse trajectory candidates tailored to various evaluation metrics. By leveraging rule-based teachers, Hydra-MDP learns how the environment influences planning in an end-to-end manner, avoiding non-differentiable post-processing. This method achieved 1st place in the Navsim challenge, demonstrating significant improvements in generalization across diverse driving environments and conditions.
**Introduction:**
End-to-end autonomous driving planning, which involves learning a neural planner with raw sensor inputs, is a promising direction for achieving full autonomy. However, recent studies have exposed vulnerabilities and limitations of imitation learning (IL) methods, particularly in open-loop evaluation. To address this, Hydra-MDP proposes a multi-target and multimodal planning approach, where the student model learns from both rule-based planners and human drivers using a multi-head decoder. This method integrates knowledge from specialized teachers and can be easily scaled by involving more cost functions or leveraging imitation similarity.
**Solution:**
Hydra-MDP consists of a Perception Network and a Trajectory Decoder. The Perception Network, based on the Transfuser model, extracts semantic information from images and LiDAR point clouds. The Trajectory Decoder uses a fixed planning vocabulary to discretize the continuous action space and incorporates environmental clues through transformer decoders. Multi-target Hydra-Distillation aligns the planner with simulation-based metrics by running offline simulations and introducing supervision from simulation scores during training.
**Experiments:**
Hydra-MDP was evaluated on the Navsim dataset, which focuses on scenarios involving changes in intention. The model achieved state-of-the-art performance under simulation-based evaluation metrics, outperforming baselines and demonstrating the effectiveness of multi-target learning and model ensembling. The method's scalability was also demonstrated with larger backbones, showing minor improvements in planning performance.
**Contributions:**
1. A universal framework for end-to-end multimodal planning via multi-target hydra-distillation.
2. State-of-the-art performance under simulation-based evaluation metrics on Navsim.