23 Apr 2024 | Ziying Song, Guoxing Zhang, Lin Liu, Lei Yang, Shaoqing Xu, Caiyan Jia*, Feiyang Jia, Li Wang
RoboFusion is a robust framework for multi-modal 3D object detection in autonomous driving (AD) that leverages visual foundation models (VFMs) like SAM to enhance performance in out-of-distribution (OOD) noise scenarios. The framework adapts SAM for AD scenarios, introduces AD-FPN for feature upsampling, and employs wavelet decomposition to denoise depth-guided images. Self-attention mechanisms are used to adaptively reweight fused features, improving informative features while suppressing noise. RoboFusion achieves state-of-the-art performance on noisy benchmarks like KITTI-C and nuScenes-C, demonstrating strong robustness and generalization. The framework includes modules such as SAM-AD, AD-FPN, Depth-Guided Wavelet Attention (DGWA), and Adaptive Fusion. Experiments show that RoboFusion outperforms existing methods in various noise scenarios, including weather and sensor noise. The framework is evaluated on clean and noisy datasets, with results indicating its effectiveness in handling real-world challenges. RoboFusion's use of VFMs enables it to generalize well in OOD scenarios, making it a promising approach for robust 3D object detection in AD.RoboFusion is a robust framework for multi-modal 3D object detection in autonomous driving (AD) that leverages visual foundation models (VFMs) like SAM to enhance performance in out-of-distribution (OOD) noise scenarios. The framework adapts SAM for AD scenarios, introduces AD-FPN for feature upsampling, and employs wavelet decomposition to denoise depth-guided images. Self-attention mechanisms are used to adaptively reweight fused features, improving informative features while suppressing noise. RoboFusion achieves state-of-the-art performance on noisy benchmarks like KITTI-C and nuScenes-C, demonstrating strong robustness and generalization. The framework includes modules such as SAM-AD, AD-FPN, Depth-Guided Wavelet Attention (DGWA), and Adaptive Fusion. Experiments show that RoboFusion outperforms existing methods in various noise scenarios, including weather and sensor noise. The framework is evaluated on clean and noisy datasets, with results indicating its effectiveness in handling real-world challenges. RoboFusion's use of VFMs enables it to generalize well in OOD scenarios, making it a promising approach for robust 3D object detection in AD.