[slides] Depth Anything%3A Unleashing the Power of Large-Scale Unlabeled Data

This paper presents Depth Anything, a practical solution for robust monocular depth estimation (MDE) that leverages large-scale unlabeled data. The authors aim to build a foundation model capable of handling any images under various conditions by scaling up the dataset through a data engine that collects and automatically annotates 62 million unlabeled images. Two key strategies are introduced: enhancing the optimization target with data augmentation to force the model to seek extra visual knowledge and acquiring rich semantic priors from pre-trained encoders. The model demonstrates impressive generalization capabilities across diverse scenes, including low-light environments, complex scenes, foggy weather, and ultra-remote distances. Extensive evaluations on six public datasets and randomly captured photos show that Depth Anything outperforms existing methods in zero-shot depth estimation and sets new state-of-the-art (SOTA) results when fine-tuned with metric depth information from NTUv2 and KITTI. The model also improves the performance of depth-conditioned ControlNet. The paper highlights the value of large-scale unlabeled data and proposes effective strategies to fully utilize its potential, making it a significant contribution to the field of MDE.This paper presents Depth Anything, a practical solution for robust monocular depth estimation (MDE) that leverages large-scale unlabeled data. The authors aim to build a foundation model capable of handling any images under various conditions by scaling up the dataset through a data engine that collects and automatically annotates 62 million unlabeled images. Two key strategies are introduced: enhancing the optimization target with data augmentation to force the model to seek extra visual knowledge and acquiring rich semantic priors from pre-trained encoders. The model demonstrates impressive generalization capabilities across diverse scenes, including low-light environments, complex scenes, foggy weather, and ultra-remote distances. Extensive evaluations on six public datasets and randomly captured photos show that Depth Anything outperforms existing methods in zero-shot depth estimation and sets new state-of-the-art (SOTA) results when fine-tuned with metric depth information from NTUv2 and KITTI. The model also improves the performance of depth-conditioned ControlNet. The paper highlights the value of large-scale unlabeled data and proposes effective strategies to fully utilize its potential, making it a significant contribution to the field of MDE.

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

7 Apr 2024 | Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao