TUMTraf V2X Cooperative Perception Dataset

TUMTraf V2X Cooperative Perception Dataset

2 Mar 2024 | Walter Zimmer, Gerhard Arya Wardana, Suren Sritharan, Xingcheng Zhou, Rui Song, Alois C. Knoll
The TUMTraf V2X Cooperative Perception Dataset is a multi-modal, multi-view dataset designed for cooperative 3D object detection and tracking. It includes 2,000 labeled point clouds and 5,000 labeled images from five roadside and four onboard sensors, covering 30,000 3D boxes with track IDs and precise GPS and IMU data. The dataset includes eight categories and covers challenging driving scenarios such as traffic violations, near-miss events, overtaking, and U-turns. The dataset is annotated with high-quality labels through careful labeling and review processes, emphasizing the challenges of cooperative perception such as pose estimation errors, latency, and synchronization. It also includes sensor data from nine different sensors covering the same traffic scenes under diverse weather conditions and lighting variations. The infrastructure sensors are oriented in all four directions of the intersection to get a 360-degree view, leading to better perception results. The dataset also includes rare events like traffic violations where pedestrians cross the road at a busy four-way intersection while the crossing light is lit red. The dataset includes a cooperative multi-modal fusion model called CoopDet3D, which achieves an increase of +14.36 3D mAP compared to a vehicle camera-LiDAR fusion model. The dataset is publicly available along with the model, labeling tool, and dev-kit. The dev-kit allows users to load annotations in the widely recognized and standard format Open-LABEL, facilitating seamless integration and utilization of the dataset. It also includes modules for preprocessing, visualization, and evaluation of perception and tracking methods. The dataset is benchmarked against existing V2X datasets, showing that cooperative perception models outperform single-viewpoint models. The dataset includes a wide range of challenging traffic scenarios, including occlusions, and provides a comprehensive view of the traffic. The dataset is also used to evaluate the performance of different models, including CoopDet3D, which outperforms other models in terms of accuracy and efficiency. The dataset is also used to evaluate the performance of different backbones for cooperative camera-LiDAR fusion models, showing that the best configuration is PointPillars 512.2x with YOLOv8. The dataset is also used to evaluate the performance of different tracking algorithms, showing that cooperative tracking models outperform single-viewpoint tracking models. The dataset is also used to evaluate the performance of different fusion methods, showing that deep fusion methods outperform late fusion methods. The dataset is also used to evaluate the performance of different labeling tools, showing that the 3D BAT v24.3.2 labeling tool is effective for labeling 3D objects. The dataset is also used to evaluate the performance of different data augmentation methods, showing that multi-modal cooperative data augmentation methods are effective for improving the performance of the model. The dataset is also used to evaluate the performance of different evaluation metrics, showing that the mAPThe TUMTraf V2X Cooperative Perception Dataset is a multi-modal, multi-view dataset designed for cooperative 3D object detection and tracking. It includes 2,000 labeled point clouds and 5,000 labeled images from five roadside and four onboard sensors, covering 30,000 3D boxes with track IDs and precise GPS and IMU data. The dataset includes eight categories and covers challenging driving scenarios such as traffic violations, near-miss events, overtaking, and U-turns. The dataset is annotated with high-quality labels through careful labeling and review processes, emphasizing the challenges of cooperative perception such as pose estimation errors, latency, and synchronization. It also includes sensor data from nine different sensors covering the same traffic scenes under diverse weather conditions and lighting variations. The infrastructure sensors are oriented in all four directions of the intersection to get a 360-degree view, leading to better perception results. The dataset also includes rare events like traffic violations where pedestrians cross the road at a busy four-way intersection while the crossing light is lit red. The dataset includes a cooperative multi-modal fusion model called CoopDet3D, which achieves an increase of +14.36 3D mAP compared to a vehicle camera-LiDAR fusion model. The dataset is publicly available along with the model, labeling tool, and dev-kit. The dev-kit allows users to load annotations in the widely recognized and standard format Open-LABEL, facilitating seamless integration and utilization of the dataset. It also includes modules for preprocessing, visualization, and evaluation of perception and tracking methods. The dataset is benchmarked against existing V2X datasets, showing that cooperative perception models outperform single-viewpoint models. The dataset includes a wide range of challenging traffic scenarios, including occlusions, and provides a comprehensive view of the traffic. The dataset is also used to evaluate the performance of different models, including CoopDet3D, which outperforms other models in terms of accuracy and efficiency. The dataset is also used to evaluate the performance of different backbones for cooperative camera-LiDAR fusion models, showing that the best configuration is PointPillars 512.2x with YOLOv8. The dataset is also used to evaluate the performance of different tracking algorithms, showing that cooperative tracking models outperform single-viewpoint tracking models. The dataset is also used to evaluate the performance of different fusion methods, showing that deep fusion methods outperform late fusion methods. The dataset is also used to evaluate the performance of different labeling tools, showing that the 3D BAT v24.3.2 labeling tool is effective for labeling 3D objects. The dataset is also used to evaluate the performance of different data augmentation methods, showing that multi-modal cooperative data augmentation methods are effective for improving the performance of the model. The dataset is also used to evaluate the performance of different evaluation metrics, showing that the mAP
Reach us at info@study.space
[slides and audio] TUMTraf V2X Cooperative Perception Dataset