Depth Anything V2 is a new model for monocular depth estimation that improves upon its predecessor by using synthetic images, scaling up the teacher model, and leveraging large-scale pseudo-labeled real images. It outperforms previous models in terms of speed, accuracy, and efficiency, with models ranging from 25M to 1.3B parameters. The model is robust to complex scenes, transparent objects, and reflective surfaces, and can be fine-tuned for various downstream tasks. The paper also introduces a new evaluation benchmark, DA-2K, which includes diverse high-resolution images with precise annotations to better assess the performance of depth estimation models. The model is trained on a combination of synthetic and real images, with the synthetic images providing precise depth labels and the real images helping to improve generalization. The paper also discusses the challenges of using synthetic data, including distribution shift and limited scene coverage, and proposes solutions such as using large-scale unlabeled real images to bridge the gap. The model is evaluated on various benchmarks, including NYU-D and KITTI, and shows improved performance compared to previous models. The paper concludes that Depth Anything V2 is a more capable foundation model for monocular depth estimation, with the potential to be used in a wide range of applications.Depth Anything V2 is a new model for monocular depth estimation that improves upon its predecessor by using synthetic images, scaling up the teacher model, and leveraging large-scale pseudo-labeled real images. It outperforms previous models in terms of speed, accuracy, and efficiency, with models ranging from 25M to 1.3B parameters. The model is robust to complex scenes, transparent objects, and reflective surfaces, and can be fine-tuned for various downstream tasks. The paper also introduces a new evaluation benchmark, DA-2K, which includes diverse high-resolution images with precise annotations to better assess the performance of depth estimation models. The model is trained on a combination of synthetic and real images, with the synthetic images providing precise depth labels and the real images helping to improve generalization. The paper also discusses the challenges of using synthetic data, including distribution shift and limited scene coverage, and proposes solutions such as using large-scale unlabeled real images to bridge the gap. The model is evaluated on various benchmarks, including NYU-D and KITTI, and shows improved performance compared to previous models. The paper concludes that Depth Anything V2 is a more capable foundation model for monocular depth estimation, with the potential to be used in a wide range of applications.