[slides and audio] Distributionally Robust Optimization and Robust Statistics

The paper reviews distributionally robust optimization (DRO), a framework for constructing statistical estimators that hedge against deviations in the expected loss between the training and deployment environments. Many well-known estimators in statistics and machine learning, such as AdaBoost, LASSO, ridge regression, and dropout training, are distributionally robust in a precise sense. The authors aim to bridge the gap between classical results and their DRO equivalent formulations, making DRO accessible to statisticians unfamiliar with the field. They also clarify the difference between DRO and classical statistical robustness, highlighting that DRO focuses on post-decision environment shifts, leading to a min-max formulation, while classical robust statistics addresses pre-decision contamination, resulting in a min-min formulation. The paper discusses various formulations of DRO, including those based on φ-divergence, optimal transport, and integral probability metrics. It provides theoretical foundations and evidence from diverse applications, demonstrating how DRO recovers and extends existing methods. The authors also explore the statistical properties of DRO estimators, such as asymptotic normality and finite-sample guarantees, and discuss tractability and Bayesian interpretations. In the context of robust statistics, the paper examines the challenges posed by data contamination and the types of contamination models and robustness criteria. It compares DRO and robust statistics, noting that DRO is more pessimistic in adversarial settings, while robust statistics is more optimistic. The paper concludes with a discussion on trending topics in DRO, including dynamic decision-making problems and causal inference.The paper reviews distributionally robust optimization (DRO), a framework for constructing statistical estimators that hedge against deviations in the expected loss between the training and deployment environments. Many well-known estimators in statistics and machine learning, such as AdaBoost, LASSO, ridge regression, and dropout training, are distributionally robust in a precise sense. The authors aim to bridge the gap between classical results and their DRO equivalent formulations, making DRO accessible to statisticians unfamiliar with the field. They also clarify the difference between DRO and classical statistical robustness, highlighting that DRO focuses on post-decision environment shifts, leading to a min-max formulation, while classical robust statistics addresses pre-decision contamination, resulting in a min-min formulation. The paper discusses various formulations of DRO, including those based on φ-divergence, optimal transport, and integral probability metrics. It provides theoretical foundations and evidence from diverse applications, demonstrating how DRO recovers and extends existing methods. The authors also explore the statistical properties of DRO estimators, such as asymptotic normality and finite-sample guarantees, and discuss tractability and Bayesian interpretations. In the context of robust statistics, the paper examines the challenges posed by data contamination and the types of contamination models and robustness criteria. It compares DRO and robust statistics, noting that DRO is more pessimistic in adversarial settings, while robust statistics is more optimistic. The paper concludes with a discussion on trending topics in DRO, including dynamic decision-making problems and causal inference.

Distributionally Robust Optimization and Robust Statistics

January 29, 2024 | Jose Blanchet*, Jiajin Li†1, Sirui Lin‡1, Xuhui Zhang§1