Understanding Markov Decision Processes%3A Discrete Stochastic Dynamic Programming

The paper introduces and analyzes a general look-ahead approach for Value Iteration Algorithms (VIA) used in solving both discounted and undiscounted Markov Decision Processes (MDPs). The approach combines the value-oriented concept with multiple adaptive relaxation factors to accelerate the convergence of VIA. The authors derive the $K$-step look-ahead formulation for various VIA schemes, including the Jacobi (J), Pre-Gauss-Seidel (PGS), Gauss-Seidel (GS), and Policy-Value (PJ) schemes. They also present computational considerations, practical guidelines for implementing the method, and enhancements such as Phase 0 and Action Elimination procedures. The effectiveness of the proposed method is demonstrated through numerical results, showing significant computational savings, especially for high discount factors and undiscounted MDPs. The paper concludes by discussing the potential of parallel processing to further enhance the method's performance.The paper introduces and analyzes a general look-ahead approach for Value Iteration Algorithms (VIA) used in solving both discounted and undiscounted Markov Decision Processes (MDPs). The approach combines the value-oriented concept with multiple adaptive relaxation factors to accelerate the convergence of VIA. The authors derive the $K$-step look-ahead formulation for various VIA schemes, including the Jacobi (J), Pre-Gauss-Seidel (PGS), Gauss-Seidel (GS), and Policy-Value (PJ) schemes. They also present computational considerations, practical guidelines for implementing the method, and enhancements such as Phase 0 and Action Elimination procedures. The effectiveness of the proposed method is demonstrated through numerical results, showing significant computational savings, especially for high discount factors and undiscounted MDPs. The paper concludes by discussing the potential of parallel processing to further enhance the method's performance.

A K-step look-ahead analysis of Value Iteration algorithms for Markov decision processes

April 1993; revised June 1994 | Meir Herzberg, Uri Yechiali