JUNE 1985 | Christos H. Papadimitriou and John N. Tsitsiklis
The paper investigates the computational complexity of optimal policy computation in Markov decision processes (MDPs). It shows that all three variants of the problem—finite horizon, infinite horizon discounted, and infinite horizon average cost—are complete for P, implying they are unlikely to be solved by highly parallel algorithms. However, deterministic versions of these problems can be solved efficiently in parallel. The partially observed version of the finite horizon problem is shown to be PSPACE-complete, making it even less likely to be solved in polynomial time than NP-complete problems. The version with no observations is NP-complete. The paper also discusses the complexity of these problems in terms of parallel algorithms, showing that the deterministic cases are in NC, while the partially observed case is PSPACE-complete. The results highlight the inherent sequential nature of the MDP problem and the challenges in parallel computation for these problems.The paper investigates the computational complexity of optimal policy computation in Markov decision processes (MDPs). It shows that all three variants of the problem—finite horizon, infinite horizon discounted, and infinite horizon average cost—are complete for P, implying they are unlikely to be solved by highly parallel algorithms. However, deterministic versions of these problems can be solved efficiently in parallel. The partially observed version of the finite horizon problem is shown to be PSPACE-complete, making it even less likely to be solved in polynomial time than NP-complete problems. The version with no observations is NP-complete. The paper also discusses the complexity of these problems in terms of parallel algorithms, showing that the deterministic cases are in NC, while the partially observed case is PSPACE-complete. The results highlight the inherent sequential nature of the MDP problem and the challenges in parallel computation for these problems.