12 Mar 2024 | Feng Liu, Tengteng Huang, Qianjing Zhang, Haotian Yao, Chi Zhang, Fang Wan, Qixiang Ye, Yanzhao Zhou
The paper introduces Ray Denoising, an innovative method designed to enhance the accuracy of multi-view 3D object detection by addressing the challenge of depth estimation from images. The method strategically samples along camera rays to construct hard negative examples, which are visually challenging to differentiate from true positives. These examples force the model to learn depth-aware features, improving its ability to distinguish between true and false positives. Ray Denoising is a plug-and-play module compatible with any DETR-style multi-view 3D detector, with minimal computational overhead during training and no impact on inference speed. Comprehensive experiments on the NuScenes and Argoverse 2 datasets demonstrate that Ray Denoising outperforms strong baselines, achieving a 1.9% improvement in mean Average Precision (mAP) over the state-of-the-art StreamPETR method on the NuScenes dataset and significant performance gains on the Argoverse 2 dataset. The method's effectiveness is further validated through ablation studies, which show that Ray Denoising significantly reduces false positive predictions along camera rays, enhancing overall detection performance.The paper introduces Ray Denoising, an innovative method designed to enhance the accuracy of multi-view 3D object detection by addressing the challenge of depth estimation from images. The method strategically samples along camera rays to construct hard negative examples, which are visually challenging to differentiate from true positives. These examples force the model to learn depth-aware features, improving its ability to distinguish between true and false positives. Ray Denoising is a plug-and-play module compatible with any DETR-style multi-view 3D detector, with minimal computational overhead during training and no impact on inference speed. Comprehensive experiments on the NuScenes and Argoverse 2 datasets demonstrate that Ray Denoising outperforms strong baselines, achieving a 1.9% improvement in mean Average Precision (mAP) over the state-of-the-art StreamPETR method on the NuScenes dataset and significant performance gains on the Argoverse 2 dataset. The method's effectiveness is further validated through ablation studies, which show that Ray Denoising significantly reduces false positive predictions along camera rays, enhancing overall detection performance.