Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving

Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving

12 Mar 2024 | JunDa Cheng, Wei Yin, Kaixuan Wang, Xiaozhi Chen, Shijie Wang, Xin Yang
This paper addresses the challenge of depth estimation in autonomous driving scenarios, where multi-view depth estimation methods often fail due to noisy camera poses. The authors propose a novel method called AFNet (Adaptive Fusion Network) that integrates both single-view and multi-view depth estimation to achieve robust and accurate depth predictions. AFNet uses a two-branch network, one for monocular depth estimation and the other for multi-view depth estimation, each predicting a depth map and a confidence map. The adaptive fusion module dynamically selects the more reliable branch based on the confidence maps, ensuring high precision and robustness under noisy poses. The method is evaluated on the DDAD and KITTI datasets, demonstrating superior performance compared to state-of-the-art methods, especially in dynamic object regions and under noisy poses. The authors also introduce a new robustness testing benchmark to evaluate the effectiveness of multi-view methods under noisy poses, showing that AFNet outperforms classical multi-view methods in this setting.This paper addresses the challenge of depth estimation in autonomous driving scenarios, where multi-view depth estimation methods often fail due to noisy camera poses. The authors propose a novel method called AFNet (Adaptive Fusion Network) that integrates both single-view and multi-view depth estimation to achieve robust and accurate depth predictions. AFNet uses a two-branch network, one for monocular depth estimation and the other for multi-view depth estimation, each predicting a depth map and a confidence map. The adaptive fusion module dynamically selects the more reliable branch based on the confidence maps, ensuring high precision and robustness under noisy poses. The method is evaluated on the DDAD and KITTI datasets, demonstrating superior performance compared to state-of-the-art methods, especially in dynamic object regions and under noisy poses. The authors also introduce a new robustness testing benchmark to evaluate the effectiveness of multi-view methods under noisy poses, showing that AFNet outperforms classical multi-view methods in this setting.
Reach us at info@study.space
[slides and audio] Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving