This paper presents *Depth Anything V2*, an advanced monocular depth estimation (MDE) model that outperforms its predecessor, V1, in terms of robustness and fine-grained details. Key improvements include replacing labeled real images with synthetic images, scaling up the teacher model, and using large-scale pseudo-labeled real images to train student models. Compared to SD-based models, Depth Anything V2 offers faster inference, fewer parameters, and higher depth accuracy. The model is available in various scales (25M to 1.3B parameters) to support diverse applications. The authors also introduce a new evaluation benchmark, DA-2K, which provides precise annotations and diverse scenes to facilitate future research. The paper discusses the limitations of real labeled data and the benefits of synthetic data, emphasizing the importance of large-scale unlabeled real images in bridging the domain gap and enhancing scene coverage. The experimental results demonstrate the model's superior performance on zero-shot relative depth estimation and metric depth estimation tasks, highlighting its robustness and fine-grained details.This paper presents *Depth Anything V2*, an advanced monocular depth estimation (MDE) model that outperforms its predecessor, V1, in terms of robustness and fine-grained details. Key improvements include replacing labeled real images with synthetic images, scaling up the teacher model, and using large-scale pseudo-labeled real images to train student models. Compared to SD-based models, Depth Anything V2 offers faster inference, fewer parameters, and higher depth accuracy. The model is available in various scales (25M to 1.3B parameters) to support diverse applications. The authors also introduce a new evaluation benchmark, DA-2K, which provides precise annotations and diverse scenes to facilitate future research. The paper discusses the limitations of real labeled data and the benefits of synthetic data, emphasizing the importance of large-scale unlabeled real images in bridging the domain gap and enhancing scene coverage. The experimental results demonstrate the model's superior performance on zero-shot relative depth estimation and metric depth estimation tasks, highlighting its robustness and fine-grained details.