This paper introduces and analyzes a K-step look-ahead approach for Value Iteration (VI) algorithms used in solving both discounted and undiscounted Markov Decision Processes (MDPs). The approach combines value-oriented concepts with adaptive relaxation factors (ARFs) to accelerate VI procedures, which perform better than using either concept alone. The method is evaluated computationally and practical guidelines are suggested for implementation. It is also shown that incorporating Phase 0, Action Elimination, and Parallel Processing can enhance the method. The approach was successfully applied to real-world problems and numerical results support its superiority, especially for undiscounted cases.
The K-step look-ahead approach involves looking ahead K value-oriented steps with adaptive relaxation factors. For discounted MDPs, four main VIA schemes (PJ, J, PGS, GS) are analyzed. For undiscounted MDPs and Semi-MDPs (SMDP), similar analyses are performed. The method uses a recursive approach to estimate future values and incorporates relaxation factors to improve convergence.
Computational considerations show that the K-step look-ahead approach reduces the gap between maximum and minimum delta values, leading to faster convergence and fewer iterations. The effectiveness of the method depends on the choice of K and relaxation factors. The paper also discusses the use of parallel processing and memory optimization for efficient implementation.
Numerical results demonstrate that the proposed method, called Multiple Adaptive Relaxation with Value Oriented (MARVO), outperforms other VI variants, especially for high discount factors and large-scale problems. The method is particularly effective for undiscounted MDPs and has been successfully applied to real-world problems in telecommunications. The approach is a hybrid of value and policy iteration, offering significant computational savings and improved performance.This paper introduces and analyzes a K-step look-ahead approach for Value Iteration (VI) algorithms used in solving both discounted and undiscounted Markov Decision Processes (MDPs). The approach combines value-oriented concepts with adaptive relaxation factors (ARFs) to accelerate VI procedures, which perform better than using either concept alone. The method is evaluated computationally and practical guidelines are suggested for implementation. It is also shown that incorporating Phase 0, Action Elimination, and Parallel Processing can enhance the method. The approach was successfully applied to real-world problems and numerical results support its superiority, especially for undiscounted cases.
The K-step look-ahead approach involves looking ahead K value-oriented steps with adaptive relaxation factors. For discounted MDPs, four main VIA schemes (PJ, J, PGS, GS) are analyzed. For undiscounted MDPs and Semi-MDPs (SMDP), similar analyses are performed. The method uses a recursive approach to estimate future values and incorporates relaxation factors to improve convergence.
Computational considerations show that the K-step look-ahead approach reduces the gap between maximum and minimum delta values, leading to faster convergence and fewer iterations. The effectiveness of the method depends on the choice of K and relaxation factors. The paper also discusses the use of parallel processing and memory optimization for efficient implementation.
Numerical results demonstrate that the proposed method, called Multiple Adaptive Relaxation with Value Oriented (MARVO), outperforms other VI variants, especially for high discount factors and large-scale problems. The method is particularly effective for undiscounted MDPs and has been successfully applied to real-world problems in telecommunications. The approach is a hybrid of value and policy iteration, offering significant computational savings and improved performance.