Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving

Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving

2024 | JunDa Cheng, Wei Yin, Kaixuan Wang, Xiaozhi Chen, Shijie Wang, Xin Yang
This paper proposes AFNet, a novel depth estimation system that adaptively fuses single-view and multi-view depth information to improve robustness and accuracy in autonomous driving scenarios. The system addresses the limitations of existing multi-view depth estimation methods, which often fail under noisy camera poses. AFNet integrates a two-branch network, one for single-view depth estimation and another for multi-view depth estimation, and uses an adaptive fusion module to dynamically select the most reliable depth estimates from both branches based on confidence maps. The adaptive fusion module uses a wrapping confidence map generated by checking the consistency of multi-view texture and camera poses to determine the reliability of depth predictions. This approach allows the system to handle challenging conditions such as textureless regions, dynamic objects, and inaccurate calibration. The system outperforms state-of-the-art multi-view and fusion methods in robustness testing and achieves state-of-the-art performance on challenging benchmarks like KITTI and DDAD when given accurate poses. The proposed method is evaluated on real-world and synthetic noise scenarios, demonstrating its robustness under various pose conditions. The results show that AFNet significantly improves depth estimation accuracy and robustness compared to existing methods, particularly in dynamic object regions and under noisy poses. The system is implemented using PyTorch and trained on the DDAD and KITTI datasets, achieving high performance in both single-view and multi-view depth estimation tasks. The method is also tested on cross-dataset scenarios, showing strong generalization ability. The proposed AFNet is a significant advancement in autonomous driving depth estimation, offering a more reliable and accurate solution for real-world applications.This paper proposes AFNet, a novel depth estimation system that adaptively fuses single-view and multi-view depth information to improve robustness and accuracy in autonomous driving scenarios. The system addresses the limitations of existing multi-view depth estimation methods, which often fail under noisy camera poses. AFNet integrates a two-branch network, one for single-view depth estimation and another for multi-view depth estimation, and uses an adaptive fusion module to dynamically select the most reliable depth estimates from both branches based on confidence maps. The adaptive fusion module uses a wrapping confidence map generated by checking the consistency of multi-view texture and camera poses to determine the reliability of depth predictions. This approach allows the system to handle challenging conditions such as textureless regions, dynamic objects, and inaccurate calibration. The system outperforms state-of-the-art multi-view and fusion methods in robustness testing and achieves state-of-the-art performance on challenging benchmarks like KITTI and DDAD when given accurate poses. The proposed method is evaluated on real-world and synthetic noise scenarios, demonstrating its robustness under various pose conditions. The results show that AFNet significantly improves depth estimation accuracy and robustness compared to existing methods, particularly in dynamic object regions and under noisy poses. The system is implemented using PyTorch and trained on the DDAD and KITTI datasets, achieving high performance in both single-view and multi-view depth estimation tasks. The method is also tested on cross-dataset scenarios, showing strong generalization ability. The proposed AFNet is a significant advancement in autonomous driving depth estimation, offering a more reliable and accurate solution for real-world applications.
Reach us at info@study.space
[slides and audio] Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving