[slides] Joint 3D Proposal Generation and Object Detection from View Aggregation

AVOD is an Aggregate View Object Detection network designed for autonomous driving scenarios. It combines LIDAR point clouds and RGB images to generate features shared by two subnetworks: a region proposal network (RPN) and a second stage detector. The RPN uses a novel architecture to generate reliable 3D object proposals for multiple classes, while the second stage detector performs accurate 3D bounding box regression and classification. AVOD achieves state-of-the-art results on the KITTI benchmark, runs in real-time with a low memory footprint, and is suitable for deployment on autonomous vehicles. The architecture uses a high-resolution feature extractor and a multimodal fusion RPN to generate accurate region proposals, especially for small classes. It also employs explicit orientation vector regression to resolve ambiguous orientation estimates. AVOD outperforms existing methods in 3D localization, orientation estimation, and category classification. It is efficient, with a low memory requirement and fast inference speed, making it suitable for real-time deployment. The network is tested on the KITTI dataset and shows superior performance compared to other methods, particularly in challenging conditions. AVOD's architecture is validated through extensive experiments and ablation studies, demonstrating its effectiveness in 3D object detection.AVOD is an Aggregate View Object Detection network designed for autonomous driving scenarios. It combines LIDAR point clouds and RGB images to generate features shared by two subnetworks: a region proposal network (RPN) and a second stage detector. The RPN uses a novel architecture to generate reliable 3D object proposals for multiple classes, while the second stage detector performs accurate 3D bounding box regression and classification. AVOD achieves state-of-the-art results on the KITTI benchmark, runs in real-time with a low memory footprint, and is suitable for deployment on autonomous vehicles. The architecture uses a high-resolution feature extractor and a multimodal fusion RPN to generate accurate region proposals, especially for small classes. It also employs explicit orientation vector regression to resolve ambiguous orientation estimates. AVOD outperforms existing methods in 3D localization, orientation estimation, and category classification. It is efficient, with a low memory requirement and fast inference speed, making it suitable for real-time deployment. The network is tested on the KITTI dataset and shows superior performance compared to other methods, particularly in challenging conditions. AVOD's architecture is validated through extensive experiments and ablation studies, demonstrating its effectiveness in 3D object detection.

Joint 3D Proposal Generation and Object Detection from View Aggregation

12 Jul 2018 | Jason Ku, Melissa Mozifian, Jungwook Lee, Ali Harakeh, and Steven L. Waslander