Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

19 Jun 2024 | Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Zuxuan Wu, Yu-Gang Jiang, Jose M. Alvarez
Hydra-MDP is a novel end-to-end multimodal planning framework that uses multi-target Hydra-distillation to train a student model with knowledge from both human and rule-based teachers. The framework employs a multi-head decoder to learn diverse trajectory candidates tailored to various evaluation metrics. By leveraging knowledge from rule-based teachers, Hydra-MDP learns how the environment influences planning in an end-to-end manner, avoiding non-differentiable post-processing. This approach achieved first place in the Navsim challenge, demonstrating significant improvements in generalization across diverse driving environments and conditions. The framework consists of a Perception Network and a Trajectory Decoder. The Perception Network is based on the official challenge baseline Transfuser, which includes an image backbone, a LiDAR backbone, and perception heads for 3D object detection and BEV segmentation. The Trajectory Decoder constructs a fixed planning vocabulary to discretize the continuous action space, using K-means clustering to form the vocabulary. The framework uses a transformer-based architecture to process the vocabulary and environmental information. Hydra-MDP introduces Multi-target Hydra-Distillation, a learning strategy that aligns the planner with simulation-based metrics. The distillation process involves running offline simulations and introducing supervision from simulation scores during training. This process distills rule-based driving knowledge into the end-to-end planner using a binary cross-entropy loss. Inference involves calculating an assembled cost based on imitation scores and metric sub-scores. Model ensembling techniques, such as Mixture of Encoders and Sub-score Ensembling, are used to improve performance. The framework was evaluated on the Navsim dataset, achieving state-of-the-art performance. The results show that Hydra-MDP outperforms the baseline, with improvements across different methods and enhanced performance through weighted confidence. The framework also demonstrates scalability with larger backbones and model ensembling.Hydra-MDP is a novel end-to-end multimodal planning framework that uses multi-target Hydra-distillation to train a student model with knowledge from both human and rule-based teachers. The framework employs a multi-head decoder to learn diverse trajectory candidates tailored to various evaluation metrics. By leveraging knowledge from rule-based teachers, Hydra-MDP learns how the environment influences planning in an end-to-end manner, avoiding non-differentiable post-processing. This approach achieved first place in the Navsim challenge, demonstrating significant improvements in generalization across diverse driving environments and conditions. The framework consists of a Perception Network and a Trajectory Decoder. The Perception Network is based on the official challenge baseline Transfuser, which includes an image backbone, a LiDAR backbone, and perception heads for 3D object detection and BEV segmentation. The Trajectory Decoder constructs a fixed planning vocabulary to discretize the continuous action space, using K-means clustering to form the vocabulary. The framework uses a transformer-based architecture to process the vocabulary and environmental information. Hydra-MDP introduces Multi-target Hydra-Distillation, a learning strategy that aligns the planner with simulation-based metrics. The distillation process involves running offline simulations and introducing supervision from simulation scores during training. This process distills rule-based driving knowledge into the end-to-end planner using a binary cross-entropy loss. Inference involves calculating an assembled cost based on imitation scores and metric sub-scores. Model ensembling techniques, such as Mixture of Encoders and Sub-score Ensembling, are used to improve performance. The framework was evaluated on the Navsim dataset, achieving state-of-the-art performance. The results show that Hydra-MDP outperforms the baseline, with improvements across different methods and enhanced performance through weighted confidence. The framework also demonstrates scalability with larger backbones and model ensembling.
Reach us at info@study.space
Understanding Hydra-MDP%3A End-to-end Multimodal Planning with Multi-target Hydra-Distillation