Robust Q-learning Algorithm for Markov Decision Processes under Wasserstein Uncertainty

Robust Q-learning Algorithm for Markov Decision Processes under Wasserstein Uncertainty

21 Jun 2024 | Ariel Neufeld, Julian Sester
The paper presents a novel Q-learning algorithm designed to solve distributionally robust Markov decision problems (MDPs) where the ambiguity set of transition probabilities is a Wasserstein ball around a reference measure. The authors prove the convergence of the algorithm and provide numerical examples using real data to demonstrate its effectiveness. The robust Q-learning algorithm combines dynamic programming principles and convex duality results for worst-case expectations with respect to a Wasserstein ball. The Wasserstein distance allows the ambiguity set to include probability measures that do not necessarily share the same support as the reference measure, which is a key advantage over the Kullback-Leibler (KL) divergence. The paper also discusses the benefits of considering distributionally robust MDPs when the estimated distributions are misspecified, showing that the robust Q-learning algorithm can outperform non-robust approaches in such scenarios.The paper presents a novel Q-learning algorithm designed to solve distributionally robust Markov decision problems (MDPs) where the ambiguity set of transition probabilities is a Wasserstein ball around a reference measure. The authors prove the convergence of the algorithm and provide numerical examples using real data to demonstrate its effectiveness. The robust Q-learning algorithm combines dynamic programming principles and convex duality results for worst-case expectations with respect to a Wasserstein ball. The Wasserstein distance allows the ambiguity set to include probability measures that do not necessarily share the same support as the reference measure, which is a key advantage over the Kullback-Leibler (KL) divergence. The paper also discusses the benefits of considering distributionally robust MDPs when the estimated distributions are misspecified, showing that the robust Q-learning algorithm can outperform non-robust approaches in such scenarios.
Reach us at info@study.space