6 Jun 2018 | Tabish Rashid * 1 Mikayel Samvelyan * 2 Christian Schroeder de Witt 1 Gregory Farquhar 1 Jakob Foerster 1 Shimon Whiteson 1
QMIX is a novel value-based method for training decentralized policies in a centralized end-to-end fashion. It addresses the challenge of extracting decentralized policies from centralized learning by enforcing a monotonic relationship between the centralized action-value function and the individual agent value functions. This ensures tractable maximization of the joint action-value in off-policy learning and guarantees consistency between centralized and decentralized policies. QMIX is evaluated on challenging StarCraft II micromanagement tasks, demonstrating superior performance compared to existing value-based multi-agent reinforcement learning methods. The method leverages extra state information available during training and allows for efficient exploitation of this information through a non-linear mixing network. Ablations show the necessity of conditioning on state information and non-linear mixing for consistent performance across tasks.QMIX is a novel value-based method for training decentralized policies in a centralized end-to-end fashion. It addresses the challenge of extracting decentralized policies from centralized learning by enforcing a monotonic relationship between the centralized action-value function and the individual agent value functions. This ensures tractable maximization of the joint action-value in off-policy learning and guarantees consistency between centralized and decentralized policies. QMIX is evaluated on challenging StarCraft II micromanagement tasks, demonstrating superior performance compared to existing value-based multi-agent reinforcement learning methods. The method leverages extra state information available during training and allows for efficient exploitation of this information through a non-linear mixing network. Ablations show the necessity of conditioning on state information and non-linear mixing for consistent performance across tasks.