[slides and audio] Unleashing HyDRa%3A Hybrid Fusion%2C Depth Consistency and Radar for Unified 3D Perception

HyDra is a novel camera-radar fusion architecture designed to enhance 3D perception for autonomous driving. It addresses the limitations of camera-based systems, which struggle with long detection ranges and adverse conditions, by integrating radar data to improve depth prediction accuracy. The architecture combines complementary camera and radar features in two distinct representation spaces: the perspective view and Bird's Eye View (BEV). Key contributions include: 1. **Height Association Transformer (HAT)**: This module leverages radar features in the perspective view to produce more robust and accurate depth predictions. 2. **Radar-weighted Depth Consistency (RDC)**: This refinement technique enhances sparse fusion features in the BEV by aligning radar and camera features, improving depth consistency. 3. **HyDra's Performance**: HyDra achieves state-of-the-art results on the nuScenes dataset, with a new score of 64.2 NDS and 58.4 AMOTA, outperforming previous camera-based methods by significant margins. 4. **Occupancy Prediction**: HyDra's fused BEV representation is used for 3D semantic occupancy prediction on the Occ3D benchmark, outperforming all previous camera-based methods by 3.7 mIoU. The paper also discusses the challenges and limitations of camera-radar fusion, emphasizing the need for a unified fusion paradigm that leverages the strengths of both modalities. HyDra's architecture is designed to handle dynamic objects and occlusions, making it a promising solution for future autonomous driving systems.HyDra is a novel camera-radar fusion architecture designed to enhance 3D perception for autonomous driving. It addresses the limitations of camera-based systems, which struggle with long detection ranges and adverse conditions, by integrating radar data to improve depth prediction accuracy. The architecture combines complementary camera and radar features in two distinct representation spaces: the perspective view and Bird's Eye View (BEV). Key contributions include: 1. **Height Association Transformer (HAT)**: This module leverages radar features in the perspective view to produce more robust and accurate depth predictions. 2. **Radar-weighted Depth Consistency (RDC)**: This refinement technique enhances sparse fusion features in the BEV by aligning radar and camera features, improving depth consistency. 3. **HyDra's Performance**: HyDra achieves state-of-the-art results on the nuScenes dataset, with a new score of 64.2 NDS and 58.4 AMOTA, outperforming previous camera-based methods by significant margins. 4. **Occupancy Prediction**: HyDra's fused BEV representation is used for 3D semantic occupancy prediction on the Occ3D benchmark, outperforming all previous camera-based methods by 3.7 mIoU. The paper also discusses the challenges and limitations of camera-radar fusion, emphasizing the need for a unified fusion paradigm that leverages the strengths of both modalities. HyDra's architecture is designed to handle dynamic objects and occlusions, making it a promising solution for future autonomous driving systems.

Unleashing HyDRA: Hybrid Fusion, Depth Consistency and Radar for Unified 3D Perception

6 Jun 2024 | Philipp Wolters, Johannes Gilg, Torben Teepe, Fabian Herzog, Anouar Laouiichi, Martin Hofmann, Gerhard Rigoll