The paper "MegaDepth: Learning Single-View Depth Prediction from Internet Photos" by Zhengqi Li and Noah Snavely addresses the challenge of single-view depth prediction, a fundamental problem in computer vision. Traditional methods rely on limited datasets from 3D sensors, which have limitations such as indoor-only images, small training sets, and sparse sampling. To overcome these issues, the authors propose using multi-view Internet photo collections, which provide virtually unlimited data, to generate training data through structure-from-motion (SfM) and multi-view stereo (MVS) methods. They introduce the MegaDepth (MD) dataset, which is created by reconstructing 200 3D models from well-photographed landmarks using SfM and MVS, and then refining the depth maps to remove noise and outliers. The authors also propose new methods for processing raw MVS output and automatically augmenting the data with ordinal depth relations derived from semantic segmentation. The MD dataset is evaluated on various datasets, including Make3D, KITTI, and DIW, demonstrating strong generalization to novel scenes and other diverse datasets. The paper highlights the effectiveness of using large amounts of diverse training data and the importance of data processing and loss function design in achieving accurate and generalizable depth predictions.The paper "MegaDepth: Learning Single-View Depth Prediction from Internet Photos" by Zhengqi Li and Noah Snavely addresses the challenge of single-view depth prediction, a fundamental problem in computer vision. Traditional methods rely on limited datasets from 3D sensors, which have limitations such as indoor-only images, small training sets, and sparse sampling. To overcome these issues, the authors propose using multi-view Internet photo collections, which provide virtually unlimited data, to generate training data through structure-from-motion (SfM) and multi-view stereo (MVS) methods. They introduce the MegaDepth (MD) dataset, which is created by reconstructing 200 3D models from well-photographed landmarks using SfM and MVS, and then refining the depth maps to remove noise and outliers. The authors also propose new methods for processing raw MVS output and automatically augmenting the data with ordinal depth relations derived from semantic segmentation. The MD dataset is evaluated on various datasets, including Make3D, KITTI, and DIW, demonstrating strong generalization to novel scenes and other diverse datasets. The paper highlights the effectiveness of using large amounts of diverse training data and the importance of data processing and loss function design in achieving accurate and generalizable depth predictions.