The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games

The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games

4 Nov 2022 | Chao Yu12*, Akash Velu22*, Eugene Vinitsky29, Jiaxuan Gao1, Yu Wang15, Alexandre Bayen2, Yi Wu139
This paper explores the effectiveness of Proximal Policy Optimization (PPO) in cooperative multi-agent settings, a domain where it has been underutilized compared to off-policy methods. The authors conduct a comprehensive empirical study on four popular multi-agent benchmarks: the multi-agent particle world environments (MPE), the StarCraft multi-agent challenge (SMAC), Google Research Football (GRF), and the Hanabi challenge. They find that PPO-based algorithms achieve strong performance with minimal hyperparameter tuning and without domain-specific modifications or architectures. Compared to competitive off-policy methods, PPO often achieves competitive or superior results in both final returns and sample efficiency. The paper also identifies and analyzes five critical implementation and hyperparameter factors that influence PPO's performance, providing concrete practical suggestions for improving its effectiveness. The results suggest that simple PPO-based methods can be strong baselines in cooperative multi-agent reinforcement learning. The source code for the experiments is available at <https://github.com/mar1benchmark/on-policy>.This paper explores the effectiveness of Proximal Policy Optimization (PPO) in cooperative multi-agent settings, a domain where it has been underutilized compared to off-policy methods. The authors conduct a comprehensive empirical study on four popular multi-agent benchmarks: the multi-agent particle world environments (MPE), the StarCraft multi-agent challenge (SMAC), Google Research Football (GRF), and the Hanabi challenge. They find that PPO-based algorithms achieve strong performance with minimal hyperparameter tuning and without domain-specific modifications or architectures. Compared to competitive off-policy methods, PPO often achieves competitive or superior results in both final returns and sample efficiency. The paper also identifies and analyzes five critical implementation and hyperparameter factors that influence PPO's performance, providing concrete practical suggestions for improving its effectiveness. The results suggest that simple PPO-based methods can be strong baselines in cooperative multi-agent reinforcement learning. The source code for the experiments is available at <https://github.com/mar1benchmark/on-policy>.
Reach us at info@study.space
[slides] The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games | StudySpace