GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose

GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose

12 Mar 2018 | Zhichao Yin and Jianping Shi
GeoNet is an unsupervised learning framework designed to estimate monocular depth, optical flow, and camera pose from video sequences. The framework leverages the geometric relationships inherent in 3D scene geometry to jointly learn these tasks in an end-to-end manner. Specifically, GeoNet employs a cascaded architecture with two stages: the first stage reconstructs rigid structure using depth maps and camera poses, while the second stage localizes non-rigid motion using residual flow. The framework introduces an adaptive geometric consistency loss to enhance robustness against outliers and non-Lambertian surfaces, effectively handling occlusions and texture ambiguities. Experimental results on the KITTI dataset demonstrate that GeoNet achieves state-of-the-art performance in all three tasks, outperforming previous unsupervised methods and comparable to supervised ones. The framework's effectiveness is attributed to its ability to capture high-level cues and feature correspondences, making it suitable for a wide range of applications, including autonomous driving and robotics.GeoNet is an unsupervised learning framework designed to estimate monocular depth, optical flow, and camera pose from video sequences. The framework leverages the geometric relationships inherent in 3D scene geometry to jointly learn these tasks in an end-to-end manner. Specifically, GeoNet employs a cascaded architecture with two stages: the first stage reconstructs rigid structure using depth maps and camera poses, while the second stage localizes non-rigid motion using residual flow. The framework introduces an adaptive geometric consistency loss to enhance robustness against outliers and non-Lambertian surfaces, effectively handling occlusions and texture ambiguities. Experimental results on the KITTI dataset demonstrate that GeoNet achieves state-of-the-art performance in all three tasks, outperforming previous unsupervised methods and comparable to supervised ones. The framework's effectiveness is attributed to its ability to capture high-level cues and feature correspondences, making it suitable for a wide range of applications, including autonomous driving and robotics.
Reach us at info@study.space
[slides and audio] GeoNet%3A Unsupervised Learning of Dense Depth%2C Optical Flow and Camera Pose