21 Jun 2024 | Daniel Dauner, Marcel Hallgarten, Tianyu Li, Xinshuo Weng, Zhiyu Huang, Zetong Yang, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Andreas Geiger, Kashyap Chitta
**Abstract:**
Benchmarking vision-based driving policies is challenging due to the difficulty in balancing open-loop and closed-loop evaluations. Traditional metrics like average displacement error (ADE) do not account for the interactive and multi-modal nature of driving. This paper introduces NAVSIM, a framework that combines large datasets with a non-reactive simulator to enable large-scale real-world benchmarking. NAVSIM uses simulation-based metrics, such as progress and time to collision, to provide more meaningful evaluations of trajectory outputs from sensor-based driving policies. The non-reactive simulation decouples the evaluated policy and environment, allowing for open-loop metric computation while being better aligned with closed-loop evaluations. The framework includes a detailed analysis of popular end-to-end driving models and a competition held at CVPR 2024, where 143 teams submitted 463 entries, leading to several new insights. Simple methods with moderate compute requirements, such as TransFuser, can match recent large-scale end-to-end driving architectures like UniAD. The modular framework can be extended with new datasets, data curation strategies, and metrics, and will be continually maintained to host future challenges.
**Introduction:**
Autonomous vehicles (AVs) have gained significant research interest due to their potential to improve transportation and traffic safety. However, evaluating driving performance is challenging due to the complexity of the task and the limitations of existing benchmarks. Traditional benchmarks often focus on visual diversity and label quality rather than the relevance of data for planning tasks. Existing metrics, such as ADE, often misrepresent the relative accuracy of trajectories. Additionally, the lack of standardized evaluation setups and the domain gap between synthetic and real-world driving data hinder progress in AV research.
**Contributions:**
1. NAVSIM, a framework for non-reactive AV simulation with standardized protocols for training and testing, data curation tools, and an official public evaluation server.
2. Development of configurable simulation-based metrics suitable for evaluating sensor-based motion planning.
3. Reimplementation of popular end-to-end approaches for NAVSIM, demonstrating the potential of simple models in challenging scenarios.
**Related Work:**
End-to-end driving models streamline the entire stack from perception to planning into a single network, eliminating the need for manual design of intermediate representations. Various end-to-end models have emerged, focusing on closed-loop simulators and sensor-based approaches. Traditional benchmarks, such as nuScenes, are primarily used for perception tasks and do not fully capture the complexity of driving tasks.
**NAVSIM: Non-Reactive Autonomous Vehicle Simulation:**
NAVSIM combines the ease of use of open-loop benchmarks with metrics based on closed-loop simulators. It uses a non-reactive simulation where the evaluated policy and environment do not influence each other. This allows for the computation of open-loop metrics while being better aligned with closed-loop evaluations. The framework includes a detailed description of the task and metrics, as well as a filtering method to obtain standardized train and test splits covering challenging scenes.
**Ex**Abstract:**
Benchmarking vision-based driving policies is challenging due to the difficulty in balancing open-loop and closed-loop evaluations. Traditional metrics like average displacement error (ADE) do not account for the interactive and multi-modal nature of driving. This paper introduces NAVSIM, a framework that combines large datasets with a non-reactive simulator to enable large-scale real-world benchmarking. NAVSIM uses simulation-based metrics, such as progress and time to collision, to provide more meaningful evaluations of trajectory outputs from sensor-based driving policies. The non-reactive simulation decouples the evaluated policy and environment, allowing for open-loop metric computation while being better aligned with closed-loop evaluations. The framework includes a detailed analysis of popular end-to-end driving models and a competition held at CVPR 2024, where 143 teams submitted 463 entries, leading to several new insights. Simple methods with moderate compute requirements, such as TransFuser, can match recent large-scale end-to-end driving architectures like UniAD. The modular framework can be extended with new datasets, data curation strategies, and metrics, and will be continually maintained to host future challenges.
**Introduction:**
Autonomous vehicles (AVs) have gained significant research interest due to their potential to improve transportation and traffic safety. However, evaluating driving performance is challenging due to the complexity of the task and the limitations of existing benchmarks. Traditional benchmarks often focus on visual diversity and label quality rather than the relevance of data for planning tasks. Existing metrics, such as ADE, often misrepresent the relative accuracy of trajectories. Additionally, the lack of standardized evaluation setups and the domain gap between synthetic and real-world driving data hinder progress in AV research.
**Contributions:**
1. NAVSIM, a framework for non-reactive AV simulation with standardized protocols for training and testing, data curation tools, and an official public evaluation server.
2. Development of configurable simulation-based metrics suitable for evaluating sensor-based motion planning.
3. Reimplementation of popular end-to-end approaches for NAVSIM, demonstrating the potential of simple models in challenging scenarios.
**Related Work:**
End-to-end driving models streamline the entire stack from perception to planning into a single network, eliminating the need for manual design of intermediate representations. Various end-to-end models have emerged, focusing on closed-loop simulators and sensor-based approaches. Traditional benchmarks, such as nuScenes, are primarily used for perception tasks and do not fully capture the complexity of driving tasks.
**NAVSIM: Non-Reactive Autonomous Vehicle Simulation:**
NAVSIM combines the ease of use of open-loop benchmarks with metrics based on closed-loop simulators. It uses a non-reactive simulation where the evaluated policy and environment do not influence each other. This allows for the computation of open-loop metrics while being better aligned with closed-loop evaluations. The framework includes a detailed description of the task and metrics, as well as a filtering method to obtain standardized train and test splits covering challenging scenes.
**Ex