Robust Q-learning Algorithm for Markov Decision Processes under Wasserstein Uncertainty

Robust Q-learning Algorithm for Markov Decision Processes under Wasserstein Uncertainty

21 Jun 2024 | Ariel Neufeld, Julian Sester
This paper presents a novel Q-learning algorithm for solving distributionally robust Markov decision problems under Wasserstein uncertainty. The algorithm is designed to handle situations where the transition probabilities of a Markov decision process are uncertain and lie within a Wasserstein ball around a reference measure. The algorithm is proven to converge to the optimal robust Q-value function, and its effectiveness is demonstrated through several examples using real data. The key contributions include the development of a Q-learning algorithm that incorporates distributional robustness, which is particularly useful when the estimated distributions are misspecified in practice. The algorithm uses a convex duality result for worst-case expectations with respect to a Wasserstein ball, allowing for tractable optimization. The paper also highlights the advantages of considering distributional robustness in stochastic optimal control problems, especially when the underlying process is subject to environmental shifts. The algorithm is shown to outperform non-robust Q-learning approaches in scenarios where the assumed distribution is misspecified. The paper provides theoretical guarantees of convergence and demonstrates the algorithm's applicability through numerical examples.This paper presents a novel Q-learning algorithm for solving distributionally robust Markov decision problems under Wasserstein uncertainty. The algorithm is designed to handle situations where the transition probabilities of a Markov decision process are uncertain and lie within a Wasserstein ball around a reference measure. The algorithm is proven to converge to the optimal robust Q-value function, and its effectiveness is demonstrated through several examples using real data. The key contributions include the development of a Q-learning algorithm that incorporates distributional robustness, which is particularly useful when the estimated distributions are misspecified in practice. The algorithm uses a convex duality result for worst-case expectations with respect to a Wasserstein ball, allowing for tractable optimization. The paper also highlights the advantages of considering distributional robustness in stochastic optimal control problems, especially when the underlying process is subject to environmental shifts. The algorithm is shown to outperform non-robust Q-learning approaches in scenarios where the assumed distribution is misspecified. The paper provides theoretical guarantees of convergence and demonstrates the algorithm's applicability through numerical examples.
Reach us at info@study.space