12 Mar 2024 | Feng Liu, Tengteng Huang, Qianjing Zhang, Haotian Yao, Chi Zhang, Fang Wan, Qixiang Ye, Yanzhao Zhou
This paper introduces Ray Denoising, a novel method for multi-view 3D object detection that addresses the challenge of false positive predictions along camera rays. The method enhances detection accuracy by strategically sampling along camera rays to construct hard negative examples, which are visually challenging to differentiate from true positives. These examples compel the model to learn depth-aware features, improving its ability to distinguish between true and false positives. Ray Denoising is designed as a plug-and-play module compatible with any DETR-style multi-view 3D detector, with minimal training computational costs and no impact on inference speed. Comprehensive experiments on the NuScenes and Argoverse 2 datasets show that Ray Denoising outperforms strong baselines, achieving a 1.9% improvement in mean Average Precision (mAP) over the state-of-the-art StreamPETR method on the NuScenes dataset and significant performance gains on the Argoverse 2 dataset. The method uses the Beta distribution to create depth-aware hard negative samples, enabling the model to learn more robust features for distinguishing false positives along the same ray. The code is available at https://github.com/LiewFeng/RayDN. The key contributions include identifying the persistent challenge of false positive predictions along the same ray, introducing Ray Denoising as a novel denoising method that utilizes the Beta distribution to create depth-aware hard negative samples, and demonstrating the effectiveness of Ray Denoising on the NuScenes dataset, significantly enhancing the performance of multi-view 3D object detectors.This paper introduces Ray Denoising, a novel method for multi-view 3D object detection that addresses the challenge of false positive predictions along camera rays. The method enhances detection accuracy by strategically sampling along camera rays to construct hard negative examples, which are visually challenging to differentiate from true positives. These examples compel the model to learn depth-aware features, improving its ability to distinguish between true and false positives. Ray Denoising is designed as a plug-and-play module compatible with any DETR-style multi-view 3D detector, with minimal training computational costs and no impact on inference speed. Comprehensive experiments on the NuScenes and Argoverse 2 datasets show that Ray Denoising outperforms strong baselines, achieving a 1.9% improvement in mean Average Precision (mAP) over the state-of-the-art StreamPETR method on the NuScenes dataset and significant performance gains on the Argoverse 2 dataset. The method uses the Beta distribution to create depth-aware hard negative samples, enabling the model to learn more robust features for distinguishing false positives along the same ray. The code is available at https://github.com/LiewFeng/RayDN. The key contributions include identifying the persistent challenge of false positive predictions along the same ray, introducing Ray Denoising as a novel denoising method that utilizes the Beta distribution to create depth-aware hard negative samples, and demonstrating the effectiveness of Ray Denoising on the NuScenes dataset, significantly enhancing the performance of multi-view 3D object detectors.