RCBEVDet: Radar-camera Fusion in Bird’s Eye View for 3D Object Detection

RCBEVDet: Radar-camera Fusion in Bird’s Eye View for 3D Object Detection

25 Mar 2024 | Zhiwei Lin1*, Zhe Liu2*, Zhongyu Xia1 Xinhao Wang1 Yongtao Wang1† Shengxiang Qi3 Yang Dong3 Nan Dong3 Le Zhang2 Ce Zhu2
The paper introduces RCBEVDet, a radar-camera fusion 3D object detection method designed for bird's eye view (BEV) representation. The method aims to improve the accuracy and robustness of 3D object detection by combining multi-view cameras with millimeter-wave radar sensors. Key contributions include the design of RadarBEVNet for efficient radar BEV feature extraction and the Cross-Attention Multi-layer Fusion (CAMF) module for robust multi-modal feature alignment and fusion. RadarBEVNet consists of a dual-stream radar backbone and an RCS-aware BEV encoder, which processes radar data using both point-based and transformer-based encoders. The CAMF module uses deformable cross-attention to align radar and camera BEV features and then fuses them using channel and spatial fusion layers. Experimental results show that RCBEVDet achieves state-of-the-art performance on the nuScenes and View-of-the-farthest (VoD) datasets, outperforming camera-only and radar-camera fusion methods in terms of accuracy and speed. The source code is available at <https://github.com/VDIGPKU/RCBEVDet>.The paper introduces RCBEVDet, a radar-camera fusion 3D object detection method designed for bird's eye view (BEV) representation. The method aims to improve the accuracy and robustness of 3D object detection by combining multi-view cameras with millimeter-wave radar sensors. Key contributions include the design of RadarBEVNet for efficient radar BEV feature extraction and the Cross-Attention Multi-layer Fusion (CAMF) module for robust multi-modal feature alignment and fusion. RadarBEVNet consists of a dual-stream radar backbone and an RCS-aware BEV encoder, which processes radar data using both point-based and transformer-based encoders. The CAMF module uses deformable cross-attention to align radar and camera BEV features and then fuses them using channel and spatial fusion layers. Experimental results show that RCBEVDet achieves state-of-the-art performance on the nuScenes and View-of-the-farthest (VoD) datasets, outperforming camera-only and radar-camera fusion methods in terms of accuracy and speed. The source code is available at <https://github.com/VDIGPKU/RCBEVDet>.
Reach us at info@study.space