Markov games as a framework for multi-agent reinforcement learning

Markov games as a framework for multi-agent reinforcement learning

| Michael L. Littman
This paper introduces Markov games as a framework for multi-agent reinforcement learning, focusing on two-player zero-sum games. In traditional Markov decision processes (MDPs), the environment is considered stationary and contains no other adaptive agents. However, Markov games extend this framework to include multiple adaptive agents with interacting or competing goals. The paper presents a Q-learning-like algorithm for finding optimal policies in two-player zero-sum games and demonstrates its application to a simple two-player game where the optimal policy is probabilistic. The paper defines Markov games as environments with multiple agents, each having their own action sets and reward functions. The goal of each agent is to maximize its expected sum of discounted rewards. In the case of two-player zero-sum games, one agent tries to maximize its reward while the other tries to minimize it. This leads to the use of a minimax criterion, where the agent maximizes its minimum expected reward. The paper discusses the properties of optimal policies in Markov games, noting that they can sometimes be probabilistic, unlike deterministic policies in MDPs. It then presents methods for finding optimal policies, including value iteration and Q-learning. The Q-learning algorithm is adapted for Markov games by replacing the max operator with a minimax operator, which can be evaluated using linear programming. The paper also presents experiments demonstrating the effectiveness of the minimax-Q algorithm in a simple two-player zero-sum game modeled after soccer. The results show that the minimax-Q algorithm outperforms traditional Q-learning in certain scenarios, particularly when facing opponents that are not random. The paper concludes that Markov games provide a useful framework for multi-agent reinforcement learning and that further research is needed to explore the potential of this approach in more complex environments.This paper introduces Markov games as a framework for multi-agent reinforcement learning, focusing on two-player zero-sum games. In traditional Markov decision processes (MDPs), the environment is considered stationary and contains no other adaptive agents. However, Markov games extend this framework to include multiple adaptive agents with interacting or competing goals. The paper presents a Q-learning-like algorithm for finding optimal policies in two-player zero-sum games and demonstrates its application to a simple two-player game where the optimal policy is probabilistic. The paper defines Markov games as environments with multiple agents, each having their own action sets and reward functions. The goal of each agent is to maximize its expected sum of discounted rewards. In the case of two-player zero-sum games, one agent tries to maximize its reward while the other tries to minimize it. This leads to the use of a minimax criterion, where the agent maximizes its minimum expected reward. The paper discusses the properties of optimal policies in Markov games, noting that they can sometimes be probabilistic, unlike deterministic policies in MDPs. It then presents methods for finding optimal policies, including value iteration and Q-learning. The Q-learning algorithm is adapted for Markov games by replacing the max operator with a minimax operator, which can be evaluated using linear programming. The paper also presents experiments demonstrating the effectiveness of the minimax-Q algorithm in a simple two-player zero-sum game modeled after soccer. The results show that the minimax-Q algorithm outperforms traditional Q-learning in certain scenarios, particularly when facing opponents that are not random. The paper concludes that Markov games provide a useful framework for multi-agent reinforcement learning and that further research is needed to explore the potential of this approach in more complex environments.
Reach us at info@study.space
[slides] Markov Games as a Framework for Multi-Agent Reinforcement Learning | StudySpace