[slides and audio] Tri-Perspective view Decomposition for Geometry-Aware Depth Completion

The paper introduces a novel framework called Tri-Perspective View Decomposition (TPVD) for depth completion, which aims to reconstruct precise 3D geometry from sparse and noisy depth measurements. TPVD decomposes the 3D point cloud into three 2D views (top, front, and side), where the sparse depth input is included in the front view. This decomposition densifies the sparse 3D point clouds in 2D space using 2D convolutions. The framework employs a recurrent 2D-3D-2D TPV Fusion scheme to leverage 3D geometric priors effectively. The 2D TPV features are projected back to 3D space to obtain coarse structural representations, and a Distance-Aware Spherical Convolution (DASC) is applied to encode points with varying distributions in a compact spherical space. The 3D spherical features are then reprojected into 2D space to update the initial 2D TPV features. Additionally, a Geometric Spatial Propagation Network (GSPN) is introduced to further improve geometric consistency by adaptively choosing TPV affinity neighbors. TPVD outperforms existing methods on multiple datasets, including KITTI, NYUv2, SUN RGBD, and a new smartphone-based dataset named TOFDC, which is acquired using a time-of-flight (TOF) sensor and a color camera on smartphones. The paper also discusses the effectiveness of TPVD in various scenarios, such as depth-only input, varying number of valid points, and complex lighting and weather conditions.The paper introduces a novel framework called Tri-Perspective View Decomposition (TPVD) for depth completion, which aims to reconstruct precise 3D geometry from sparse and noisy depth measurements. TPVD decomposes the 3D point cloud into three 2D views (top, front, and side), where the sparse depth input is included in the front view. This decomposition densifies the sparse 3D point clouds in 2D space using 2D convolutions. The framework employs a recurrent 2D-3D-2D TPV Fusion scheme to leverage 3D geometric priors effectively. The 2D TPV features are projected back to 3D space to obtain coarse structural representations, and a Distance-Aware Spherical Convolution (DASC) is applied to encode points with varying distributions in a compact spherical space. The 3D spherical features are then reprojected into 2D space to update the initial 2D TPV features. Additionally, a Geometric Spatial Propagation Network (GSPN) is introduced to further improve geometric consistency by adaptively choosing TPV affinity neighbors. TPVD outperforms existing methods on multiple datasets, including KITTI, NYUv2, SUN RGBD, and a new smartphone-based dataset named TOFDC, which is acquired using a time-of-flight (TOF) sensor and a color camera on smartphones. The paper also discusses the effectiveness of TPVD in various scenarios, such as depth-only input, varying number of valid points, and complex lighting and weather conditions.

Tri-Perspective View Decomposition for Geometry-Aware Depth Completion

22 Mar 2024 | Zhiqiang Yan1, Yuankai Lin2, Kun Wang1, Yupeng Zheng3, Yufei Wang4, Zhenyu Zhang5, Jun Li1, and Jian Yang1

Tri-Perspective View Decomposition for Geometry-Aware Depth Completion

22 Mar 2024 | Zhiqiang Yan1, Yuankai Lin2, Kun Wang1, Yupeng Zheng3, Yufei Wang4, Zhenyu Zhang5, Jun Li1*, and Jian Yang1*

22 Mar 2024 | Zhiqiang Yan1, Yuankai Lin2, Kun Wang1, Yupeng Zheng3, Yufei Wang4, Zhenyu Zhang5, Jun Li1, and Jian Yang1