Value-Decomposition Networks For Cooperative Multi-Agent Learning

Value-Decomposition Networks For Cooperative Multi-Agent Learning

16 Jun 2017 | Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, Thore Graepel
The paper "Value-Decomposition Networks For Cooperative Multi-Agent Learning" by Peter Sunehag et al. addresses the challenge of cooperative multi-agent reinforcement learning (MARL) with a single joint reward signal. The authors identify issues with both fully centralized and decentralized approaches, such as spurious rewards and the "lazy agent" problem due to partial observability. To tackle these problems, they introduce a novel value decomposition network architecture that learns to decompose the team value function into agent-wise value functions. This approach avoids spurious rewards and allows for more effective learning, especially when combined with techniques like weight sharing, role information, and information channels. The experimental evaluation across various partially observable multi-agent domains demonstrates that the value decomposition network outperforms centralized and fully independent learners, providing a more robust solution for cooperative MARL problems.The paper "Value-Decomposition Networks For Cooperative Multi-Agent Learning" by Peter Sunehag et al. addresses the challenge of cooperative multi-agent reinforcement learning (MARL) with a single joint reward signal. The authors identify issues with both fully centralized and decentralized approaches, such as spurious rewards and the "lazy agent" problem due to partial observability. To tackle these problems, they introduce a novel value decomposition network architecture that learns to decompose the team value function into agent-wise value functions. This approach avoids spurious rewards and allows for more effective learning, especially when combined with techniques like weight sharing, role information, and information channels. The experimental evaluation across various partially observable multi-agent domains demonstrates that the value decomposition network outperforms centralized and fully independent learners, providing a more robust solution for cooperative MARL problems.
Reach us at info@study.space
[slides] Value-Decomposition Networks For Cooperative Multi-Agent Learning | StudySpace