DPFT: Dual Perspective Fusion Transformer for Camera-Radar-based Object Detection

DPFT: Dual Perspective Fusion Transformer for Camera-Radar-based Object Detection

2024 | Felix Fent, Andras Palffy, Holger Caesar
The Dual Perspective Fusion Transformer (DPFT) is a novel camera-radar fusion method designed to address the limitations of existing approaches in autonomous driving object detection. The method leverages raw 4D radar data (the radar cube) instead of processed point clouds to preserve more information and employs projections in both the camera and ground planes to effectively use radar data with elevation information. This approach simplifies the fusion with camera data and maintains complementary sensor features. DPFT achieves state-of-the-art performance on the K-Radar dataset while demonstrating robustness against adverse weather conditions and maintaining low inference time. The method uses two projections from the 4D radar cube: one parallel to the image plane for camera-radar fusion and another perpendicular to preserve radar information. The radar data is projected onto the range-azimuth (RA) and azimuth-elevation (AE) planes, and the data is fed through ResNet backbones and Feature Pyramid Networks (FPN) to extract features. These features are then fused using a modified deformable attention mechanism to query 3D objects directly from individual perspectives, avoiding the loss of information caused by a uniform feature space. The DPFT model achieves a mean average precision (mAP) of 56.1% on the K-Radar dataset, outperforming existing methods in severe weather conditions. The model is robust against sensor failure and has a low inference time of 87 ms, making it suitable for real-world applications. The method's computational complexity is lower than other fusion approaches, and it shows promising results in different input modalities. Despite its success, the model has limitations in detecting objects moving tangentially to the ego vehicle's direction and in differentiating between closely spaced objects. The model's generalization capability is limited to the K-Radar dataset, and further research is needed to improve its performance in various scenarios.The Dual Perspective Fusion Transformer (DPFT) is a novel camera-radar fusion method designed to address the limitations of existing approaches in autonomous driving object detection. The method leverages raw 4D radar data (the radar cube) instead of processed point clouds to preserve more information and employs projections in both the camera and ground planes to effectively use radar data with elevation information. This approach simplifies the fusion with camera data and maintains complementary sensor features. DPFT achieves state-of-the-art performance on the K-Radar dataset while demonstrating robustness against adverse weather conditions and maintaining low inference time. The method uses two projections from the 4D radar cube: one parallel to the image plane for camera-radar fusion and another perpendicular to preserve radar information. The radar data is projected onto the range-azimuth (RA) and azimuth-elevation (AE) planes, and the data is fed through ResNet backbones and Feature Pyramid Networks (FPN) to extract features. These features are then fused using a modified deformable attention mechanism to query 3D objects directly from individual perspectives, avoiding the loss of information caused by a uniform feature space. The DPFT model achieves a mean average precision (mAP) of 56.1% on the K-Radar dataset, outperforming existing methods in severe weather conditions. The model is robust against sensor failure and has a low inference time of 87 ms, making it suitable for real-world applications. The method's computational complexity is lower than other fusion approaches, and it shows promising results in different input modalities. Despite its success, the model has limitations in detecting objects moving tangentially to the ego vehicle's direction and in differentiating between closely spaced objects. The model's generalization capability is limited to the K-Radar dataset, and further research is needed to improve its performance in various scenarios.
Reach us at info@study.space