JUNE 1985 | Christos H. Papadimitriou and John N. Tsitsiklis
The paper by Christos H. Papadimitriou and John N. Tsitsiklis investigates the complexity of optimal policy computation in Markov decision processes (MDPs). The authors show that the classical problems of finite horizon, infinite horizon discounted, and infinite horizon average cost are P-complete, meaning they are likely not solvable by highly parallel algorithms. However, the deterministic versions of these problems can be solved efficiently in parallel. The partially observed state version of the finite horizon problem is shown to be PSPACE-complete, indicating it is even less likely to be solved in polynomial time. The problem with no observations is proven to be NP-complete. The paper also discusses the design of optimal strategies for MDPs and the complexity of parallel algorithms for these problems, providing detailed proofs and reductions to support the findings.The paper by Christos H. Papadimitriou and John N. Tsitsiklis investigates the complexity of optimal policy computation in Markov decision processes (MDPs). The authors show that the classical problems of finite horizon, infinite horizon discounted, and infinite horizon average cost are P-complete, meaning they are likely not solvable by highly parallel algorithms. However, the deterministic versions of these problems can be solved efficiently in parallel. The partially observed state version of the finite horizon problem is shown to be PSPACE-complete, indicating it is even less likely to be solved in polynomial time. The problem with no observations is proven to be NP-complete. The paper also discusses the design of optimal strategies for MDPs and the complexity of parallel algorithms for these problems, providing detailed proofs and reductions to support the findings.