Unleashing HyDRa: Hybrid Fusion, Depth Consistency and Radar for Unified 3D Perception

Unleashing HyDRa: Hybrid Fusion, Depth Consistency and Radar for Unified 3D Perception

6 Jun 2024 | Philipp Wolters, Johannes Gilg, Torben Teepe, Fabian Herzog, Anouar Laouichi, Martin Hofmann, Gerhard Rigoll
HyDRa is a novel camera-radar fusion architecture designed for diverse 3D perception tasks, aiming to improve depth prediction and achieve state-of-the-art performance in 3D object detection and semantic occupancy prediction. The system introduces a hybrid fusion approach that combines the strengths of camera and radar features in two distinct representation spaces. The Height Association Transformer (HAT) module leverages radar features in the perspective view to produce more robust and accurate depth predictions. In the Bird's Eye View (BEV), the system refines the initial sparse representation using Radar-weighted Depth Consistency (RDC), enhancing the fusion of sparse features and addressing misaligned or occluded objects. HyDRa achieves a new state-of-the-art on the nuScenes dataset with 64.2 NDS and 58.4 AMOTA, and outperforms previous camera-based methods on the Occ3D benchmark by 3.7 mIoU. The system's BEV features can be directly converted into a powerful occupancy representation, enabling effective 3D semantic occupancy prediction. HyDRa's architecture includes a modality-specific feature encoder, unified depth prediction, BEV fusion, radar-guided backward projection, and downstream task heads. The system's contributions include the HAT module, RDC, and a novel approach to fusion that improves depth estimation and task performance. HyDRa demonstrates superior performance in 3D object detection, multi-object tracking, and occupancy prediction, and is capable of handling challenging scenarios such as night conditions. The system's design allows for efficient and accurate 3D perception, making it a promising solution for autonomous driving.HyDRa is a novel camera-radar fusion architecture designed for diverse 3D perception tasks, aiming to improve depth prediction and achieve state-of-the-art performance in 3D object detection and semantic occupancy prediction. The system introduces a hybrid fusion approach that combines the strengths of camera and radar features in two distinct representation spaces. The Height Association Transformer (HAT) module leverages radar features in the perspective view to produce more robust and accurate depth predictions. In the Bird's Eye View (BEV), the system refines the initial sparse representation using Radar-weighted Depth Consistency (RDC), enhancing the fusion of sparse features and addressing misaligned or occluded objects. HyDRa achieves a new state-of-the-art on the nuScenes dataset with 64.2 NDS and 58.4 AMOTA, and outperforms previous camera-based methods on the Occ3D benchmark by 3.7 mIoU. The system's BEV features can be directly converted into a powerful occupancy representation, enabling effective 3D semantic occupancy prediction. HyDRa's architecture includes a modality-specific feature encoder, unified depth prediction, BEV fusion, radar-guided backward projection, and downstream task heads. The system's contributions include the HAT module, RDC, and a novel approach to fusion that improves depth estimation and task performance. HyDRa demonstrates superior performance in 3D object detection, multi-object tracking, and occupancy prediction, and is capable of handling challenging scenarios such as night conditions. The system's design allows for efficient and accurate 3D perception, making it a promising solution for autonomous driving.
Reach us at info@study.space