17 Dec 2019 | Yaniv Ovadia, Emily Fertig, Jie Ren, Zachary Nado, D Sculley, Sebastian Nowozin, Joshua V. Dillon, Balaji Lakshminarayanan, Jasper Snoek
The paper "Can You Trust Your Model’s Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift" by Yaniv Ovadia addresses the critical issue of quantifying predictive uncertainty in modern machine learning models, particularly in the context of dataset shift. Dataset shift occurs when the input distribution used to evaluate a model's predictions differs from the distribution from which the model was trained, often due to factors like sample bias or non-stationarity. The authors present a large-scale benchmark to evaluate various probabilistic deep learning methods, including Bayesian and non-Bayesian approaches, under dataset shift conditions. They find that traditional post-hoc calibration methods and several other existing methods fall short in providing reliable uncertainty estimates under dataset shift. However, some methods that marginalize over models, such as ensembles, perform surprisingly well across a wide range of tasks. The paper also discusses the limitations and drawbacks of different metrics used to evaluate uncertainty, such as the Brier score and expected calibration error (ECE), and provides detailed results on image, text, and categorical data modalities. The authors conclude by highlighting the importance of robust uncertainty estimation in real-world applications and the need for more research to improve the performance of existing methods under dataset shift.The paper "Can You Trust Your Model’s Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift" by Yaniv Ovadia addresses the critical issue of quantifying predictive uncertainty in modern machine learning models, particularly in the context of dataset shift. Dataset shift occurs when the input distribution used to evaluate a model's predictions differs from the distribution from which the model was trained, often due to factors like sample bias or non-stationarity. The authors present a large-scale benchmark to evaluate various probabilistic deep learning methods, including Bayesian and non-Bayesian approaches, under dataset shift conditions. They find that traditional post-hoc calibration methods and several other existing methods fall short in providing reliable uncertainty estimates under dataset shift. However, some methods that marginalize over models, such as ensembles, perform surprisingly well across a wide range of tasks. The paper also discusses the limitations and drawbacks of different metrics used to evaluate uncertainty, such as the Brier score and expected calibration error (ECE), and provides detailed results on image, text, and categorical data modalities. The authors conclude by highlighting the importance of robust uncertainty estimation in real-world applications and the need for more research to improve the performance of existing methods under dataset shift.