[slides] A Survey on Autonomous Driving Datasets%3A Statistics%2C Annotation Quality%2C and a Future Outlook

This paper presents a comprehensive survey of 265 autonomous driving datasets, analyzing their characteristics, annotation quality, and impact. The study evaluates datasets from multiple perspectives, including sensor modalities, data size, tasks, and environmental conditions. A novel impact score metric is introduced to assess the significance of datasets, which can guide the creation of new datasets. The survey also analyzes the annotation processes, existing labeling tools, and the quality of annotations in various autonomous driving tasks. Additionally, the impact of geographical and adversarial environmental conditions on autonomous driving systems is thoroughly examined. The data distribution of several key datasets is discussed, along with their strengths and weaknesses. The paper also explores current challenges and future trends in autonomous driving datasets, such as integrating language into AD data, generating AD data using Vision Language Models, standardizing data creation, and promoting an open data ecosystem. The survey covers all tasks from perception to control, considers real-world and synthetic data, and provides insights into the data modality and quality of several crucial datasets. The main contributions include an exhaustive survey of autonomous driving datasets, a systematic analysis of sensors and sensing domains, and the introduction of an impact score metric. The survey also discusses the annotation quality of datasets, data distribution statistics, and future trends in autonomous driving datasets. The paper concludes with a structured survey of autonomous driving datasets, covering dataset collection, evaluation metrics, sensors and perception technology, tasks in autonomous driving, high-influence datasets, and future trends. The survey provides a detailed analysis of the impact of datasets on autonomous driving, highlighting the importance of standardizing annotation processes and improving data diversity. The study emphasizes the need for comprehensive and diverse datasets to enhance the robustness and generalizability of autonomous driving systems.This paper presents a comprehensive survey of 265 autonomous driving datasets, analyzing their characteristics, annotation quality, and impact. The study evaluates datasets from multiple perspectives, including sensor modalities, data size, tasks, and environmental conditions. A novel impact score metric is introduced to assess the significance of datasets, which can guide the creation of new datasets. The survey also analyzes the annotation processes, existing labeling tools, and the quality of annotations in various autonomous driving tasks. Additionally, the impact of geographical and adversarial environmental conditions on autonomous driving systems is thoroughly examined. The data distribution of several key datasets is discussed, along with their strengths and weaknesses. The paper also explores current challenges and future trends in autonomous driving datasets, such as integrating language into AD data, generating AD data using Vision Language Models, standardizing data creation, and promoting an open data ecosystem. The survey covers all tasks from perception to control, considers real-world and synthetic data, and provides insights into the data modality and quality of several crucial datasets. The main contributions include an exhaustive survey of autonomous driving datasets, a systematic analysis of sensors and sensing domains, and the introduction of an impact score metric. The survey also discusses the annotation quality of datasets, data distribution statistics, and future trends in autonomous driving datasets. The paper concludes with a structured survey of autonomous driving datasets, covering dataset collection, evaluation metrics, sensors and perception technology, tasks in autonomous driving, high-influence datasets, and future trends. The survey provides a detailed analysis of the impact of datasets on autonomous driving, highlighting the importance of standardizing annotation processes and improving data diversity. The study emphasizes the need for comprehensive and diverse datasets to enhance the robustness and generalizability of autonomous driving systems.

A Survey on Autonomous Driving Datasets: Statistics, Annotation Quality, and a Future Outlook

23 Apr 2024 | Mingyu Liu, Ekim Yurtsever, Jonathan Fossaert, Xingcheng Zhou, Walter Zimmer, Yuning Cui, Bare Luka Zagar, Alois C. Knoll