5 Feb 2024 | Ahmed Ghita, Bjørk Antoniussen, Walter Zimmer, Ross Greer, Christian Creß, Andreas Møgelmose, Mohan M. Trivedi, Alois C. Knoll
ActiveAnno3D is an active learning framework for multi-modal 3D object detection, designed to reduce annotation costs while maintaining high detection performance. The framework selects the most informative data samples for labeling, optimizing both computational efficiency and detection accuracy. It integrates with the proAnno labeling tool to enable AI-assisted data selection and labeling, minimizing manual annotation efforts. The framework was tested on the nuScenes and TUM Traffic Intersection datasets, demonstrating that it can achieve performance comparable to full data usage with only half the training data. For example, PV-RCNN achieved 77.25 mAP with half the data, compared to 83.50 mAP with full data, while BEVFusion achieved 64.31 mAP with half the data and 75.0 mAP with full data. The framework also supports multi-modal approaches, combining LiDAR and camera data for improved detection. Active learning was applied to both LiDAR-only and multi-modal 3D object detection models, with entropy-based query strategies showing effectiveness in certain scenarios. The framework also explores continuous training strategies to reduce computational costs. The results show that active learning can significantly reduce annotation costs without compromising performance, making it a valuable approach for efficient 3D object detection in autonomous driving. The framework is available for use, with code, weights, and visualization results provided on the official website.ActiveAnno3D is an active learning framework for multi-modal 3D object detection, designed to reduce annotation costs while maintaining high detection performance. The framework selects the most informative data samples for labeling, optimizing both computational efficiency and detection accuracy. It integrates with the proAnno labeling tool to enable AI-assisted data selection and labeling, minimizing manual annotation efforts. The framework was tested on the nuScenes and TUM Traffic Intersection datasets, demonstrating that it can achieve performance comparable to full data usage with only half the training data. For example, PV-RCNN achieved 77.25 mAP with half the data, compared to 83.50 mAP with full data, while BEVFusion achieved 64.31 mAP with half the data and 75.0 mAP with full data. The framework also supports multi-modal approaches, combining LiDAR and camera data for improved detection. Active learning was applied to both LiDAR-only and multi-modal 3D object detection models, with entropy-based query strategies showing effectiveness in certain scenarios. The framework also explores continuous training strategies to reduce computational costs. The results show that active learning can significantly reduce annotation costs without compromising performance, making it a valuable approach for efficient 3D object detection in autonomous driving. The framework is available for use, with code, weights, and visualization results provided on the official website.