2020 | René Ranftl*, Katrin Lasinger*, David Hafner, Konrad Schindler, and Vladlen Koltun
The paper "Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer" by René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, and Vladlen Koltun addresses the challenge of monocular depth estimation, which relies heavily on large and diverse training sets. The authors propose a robust training objective that is invariant to changes in depth range and scale, advocate the use of multi-objective learning to combine data from different sources, and highlight the importance of pretraining encoders on auxiliary tasks. They experiment with five diverse training datasets, including a new dataset of 3D films, to demonstrate the generalization power of their approach through zero-shot cross-dataset transfer. The results show that mixing data from complementary sources significantly improves monocular depth estimation, outperforming competing methods across various datasets and setting a new state-of-the-art for monocular depth estimation. The paper also discusses the challenges and solutions for training on diverse datasets, including the development of scale- and shift-invariant losses and the use of high-capacity encoders.The paper "Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer" by René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, and Vladlen Koltun addresses the challenge of monocular depth estimation, which relies heavily on large and diverse training sets. The authors propose a robust training objective that is invariant to changes in depth range and scale, advocate the use of multi-objective learning to combine data from different sources, and highlight the importance of pretraining encoders on auxiliary tasks. They experiment with five diverse training datasets, including a new dataset of 3D films, to demonstrate the generalization power of their approach through zero-shot cross-dataset transfer. The results show that mixing data from complementary sources significantly improves monocular depth estimation, outperforming competing methods across various datasets and setting a new state-of-the-art for monocular depth estimation. The paper also discusses the challenges and solutions for training on diverse datasets, including the development of scale- and shift-invariant losses and the use of high-capacity encoders.