1 Apr 2024 | Jie Tang, Fei-Peng Tian, Boshi An, Jian Li, Ping Tan
This paper introduces a Bilateral Propagation Network (BP-Net) for depth completion. BP-Net propagates depth at the earliest stage, rather than the refinement stage, avoiding the following multi-modal stage from the sparsity problem. The proposed bilateral propagation module can dynamically predict propagation coefficients conditioned on both radiometric difference and spatial distance, to enable depth propagation with the preference of nearest values on both domain and range. Experimental results demonstrate the outstanding performance of BP-Net and also suggest the importance of propagation at the earliest stage in contrast to the refinement stage. BP-Net, mostly consisting of local propagation and convolution operations, may be limited to local structure and have difficulty with long-range information delivery. Combining BP-Net with global and non-local operations, such as Transformer for multi-modal fusion and non-local propagation for depth refinement, are directions for future works. BP-Net achieves state-of-the-art performance on the NYUv2 dataset and ranks first on the KITTI depth completion benchmark. The method integrates bilateral propagation with multi-modal fusion and depth refinement in a multi-scale framework, demonstrating outstanding performance on both indoor and outdoor scenes. The bilateral propagation module is a non-linear model, whose coefficients are dynamically generated depending on image content and spatial distance. The method uses a multi-scale architecture to estimate depth from coarse to fine, with low-resolution results used as prior for high-resolution estimation. A multi-scale loss is designed to better supervise the multi-scale network. The method is trained in an end-to-end manner with a multi-scale loss to provide adequate supervision on the depth map estimated in each scale. The method is evaluated on both indoor and outdoor scenes, achieving strong performance across various sparsity levels. The results show that the method is effective in depth completion, with clear object boundaries and rich details. The method is also robust to different sparsity levels, achieving the lowest RMSE and highest δ₁.₂₅ across various sparsity levels. The method is compared with other state-of-the-art methods, showing superior performance on both the KITTI and NYUv2 datasets. The method is also validated through ablation studies, showing the effectiveness of the bilateral propagation module and the importance of early-stage propagation in contrast to the refinement stage.This paper introduces a Bilateral Propagation Network (BP-Net) for depth completion. BP-Net propagates depth at the earliest stage, rather than the refinement stage, avoiding the following multi-modal stage from the sparsity problem. The proposed bilateral propagation module can dynamically predict propagation coefficients conditioned on both radiometric difference and spatial distance, to enable depth propagation with the preference of nearest values on both domain and range. Experimental results demonstrate the outstanding performance of BP-Net and also suggest the importance of propagation at the earliest stage in contrast to the refinement stage. BP-Net, mostly consisting of local propagation and convolution operations, may be limited to local structure and have difficulty with long-range information delivery. Combining BP-Net with global and non-local operations, such as Transformer for multi-modal fusion and non-local propagation for depth refinement, are directions for future works. BP-Net achieves state-of-the-art performance on the NYUv2 dataset and ranks first on the KITTI depth completion benchmark. The method integrates bilateral propagation with multi-modal fusion and depth refinement in a multi-scale framework, demonstrating outstanding performance on both indoor and outdoor scenes. The bilateral propagation module is a non-linear model, whose coefficients are dynamically generated depending on image content and spatial distance. The method uses a multi-scale architecture to estimate depth from coarse to fine, with low-resolution results used as prior for high-resolution estimation. A multi-scale loss is designed to better supervise the multi-scale network. The method is trained in an end-to-end manner with a multi-scale loss to provide adequate supervision on the depth map estimated in each scale. The method is evaluated on both indoor and outdoor scenes, achieving strong performance across various sparsity levels. The results show that the method is effective in depth completion, with clear object boundaries and rich details. The method is also robust to different sparsity levels, achieving the lowest RMSE and highest δ₁.₂₅ across various sparsity levels. The method is compared with other state-of-the-art methods, showing superior performance on both the KITTI and NYUv2 datasets. The method is also validated through ablation studies, showing the effectiveness of the bilateral propagation module and the importance of early-stage propagation in contrast to the refinement stage.