[slides] Bench2Drive%3A Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving

Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving Bench2Drive is a new benchmark for evaluating end-to-end autonomous driving (E2E-AD) systems in a closed-loop manner. It provides a comprehensive, realistic, and fair testing environment for Full Self-Driving (FSD) systems. The benchmark includes a large-scale, fully annotated dataset of 2 million frames collected from 10,000 short clips under 44 interactive scenarios, 23 weathers, and 12 towns in CARLA v2. The evaluation protocol requires E2E-AD models to pass 44 interactive scenarios under different locations and weathers, resulting in 220 routes that provide a comprehensive and disentangled assessment of their driving capability under different situations. The benchmark includes a multi-ability evaluation toolkit for granular driving skill assessment. It evaluates E2E-AD models using two metrics: Success Rate (SR) and Driving Score (DS). SR measures the proportion of successfully completed routes without traffic violations, while DS considers both route completion and penalty for infractions. The benchmark also includes a set of state-of-the-art E2E-AD methods, including UniAD, VAD, AD-MLP, TCP, ThinkTwice, and DriveAdapter, which are evaluated in Bench2Drive. The benchmark addresses the limitations of existing evaluation methodologies for E2E-AD systems, which often rely on open-loop metrics that do not fully reflect the driving performance of algorithms. Bench2Drive provides a closed-loop evaluation protocol that allows for a more accurate assessment of an AD system's driving performance. The benchmark also includes a large-scale, annotation-rich official training dataset collected by the expert model Think2Drive, which ensures that all AD systems are trained under abundant yet similar conditions, enabling fair algorithm-level comparisons. The benchmark provides insights into the current status and future directions of E2E-AD systems. It highlights the importance of closed-loop evaluation for assessing the performance of AD systems in complex and interactive traffic scenarios. The benchmark also addresses the challenges of evaluating AD systems in real-world settings, where the data is often imbalanced and the environments are dynamic. The benchmark provides a structured and focused evaluation framework that allows for a detailed analysis of how different AD systems perform on individual tasks. The benchmark is designed to provide a comprehensive and realistic assessment of the capabilities of E2E-AD systems, enabling targeted improvements and more refined technology development.Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving Bench2Drive is a new benchmark for evaluating end-to-end autonomous driving (E2E-AD) systems in a closed-loop manner. It provides a comprehensive, realistic, and fair testing environment for Full Self-Driving (FSD) systems. The benchmark includes a large-scale, fully annotated dataset of 2 million frames collected from 10,000 short clips under 44 interactive scenarios, 23 weathers, and 12 towns in CARLA v2. The evaluation protocol requires E2E-AD models to pass 44 interactive scenarios under different locations and weathers, resulting in 220 routes that provide a comprehensive and disentangled assessment of their driving capability under different situations. The benchmark includes a multi-ability evaluation toolkit for granular driving skill assessment. It evaluates E2E-AD models using two metrics: Success Rate (SR) and Driving Score (DS). SR measures the proportion of successfully completed routes without traffic violations, while DS considers both route completion and penalty for infractions. The benchmark also includes a set of state-of-the-art E2E-AD methods, including UniAD, VAD, AD-MLP, TCP, ThinkTwice, and DriveAdapter, which are evaluated in Bench2Drive. The benchmark addresses the limitations of existing evaluation methodologies for E2E-AD systems, which often rely on open-loop metrics that do not fully reflect the driving performance of algorithms. Bench2Drive provides a closed-loop evaluation protocol that allows for a more accurate assessment of an AD system's driving performance. The benchmark also includes a large-scale, annotation-rich official training dataset collected by the expert model Think2Drive, which ensures that all AD systems are trained under abundant yet similar conditions, enabling fair algorithm-level comparisons. The benchmark provides insights into the current status and future directions of E2E-AD systems. It highlights the importance of closed-loop evaluation for assessing the performance of AD systems in complex and interactive traffic scenarios. The benchmark also addresses the challenges of evaluating AD systems in real-world settings, where the data is often imbalanced and the environments are dynamic. The benchmark provides a structured and focused evaluation framework that allows for a detailed analysis of how different AD systems perform on individual tasks. The benchmark is designed to provide a comprehensive and realistic assessment of the capabilities of E2E-AD systems, enabling targeted improvements and more refined technology development.

Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving

11 Jun 2024 | Xiaosong Jia*, Zhenjie Yang*, Qifeng Li*, Zhiyuan Zhang*, Junchi Yan†

11 Jun 2024 | Xiaosong Jia, Zhenjie Yang, Qifeng Li, Zhiyuan Zhang, Junchi Yan†