14 Mar 2020 | Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, Igor Mordatch
The paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments" by Ryan Lowe explores deep reinforcement learning methods for multi-agent domains. It addresses the challenges faced by traditional algorithms in multi-agent settings, such as the non-stationarity of the environment and the high variance of policy gradient methods. The authors propose an adaptation of actor-critic methods that considers the action policies of other agents, enabling the learning of complex multi-agent coordination policies. Additionally, they introduce a training regimen using an ensemble of policies for each agent to enhance the robustness of multi-agent policies. The approach is evaluated in both cooperative and competitive scenarios, demonstrating its effectiveness in discovering various physical and informational coordination strategies. The paper also discusses related work, provides a background on Markov Games and Q-Learning, and presents the detailed methodology, including the multi-agent actor-critic algorithm and techniques for inferring policies of other agents. Experimental results show that the proposed method outperforms traditional RL algorithms in a variety of cooperative and competitive multi-agent environments.The paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments" by Ryan Lowe explores deep reinforcement learning methods for multi-agent domains. It addresses the challenges faced by traditional algorithms in multi-agent settings, such as the non-stationarity of the environment and the high variance of policy gradient methods. The authors propose an adaptation of actor-critic methods that considers the action policies of other agents, enabling the learning of complex multi-agent coordination policies. Additionally, they introduce a training regimen using an ensemble of policies for each agent to enhance the robustness of multi-agent policies. The approach is evaluated in both cooperative and competitive scenarios, demonstrating its effectiveness in discovering various physical and informational coordination strategies. The paper also discusses related work, provides a background on Markov Games and Q-Learning, and presents the detailed methodology, including the multi-agent actor-critic algorithm and techniques for inferring policies of other agents. Experimental results show that the proposed method outperforms traditional RL algorithms in a variety of cooperative and competitive multi-agent environments.