27 Nov 2015 | Ardi Tampuu*, Tambet Matiisen*, Dorian Kodelja, Ilya Kuzovkin, Kristjan Korjus, Juhan Aru†, Jaan Aru, Raul Vicente‡
This paper explores the use of Deep Q-Networks (DQN) in multiagent systems, focusing on how two agents interact in the classic video game Pong. The study extends the DQN architecture proposed by Google DeepMind to multiagent environments and investigates how competitive and collaborative behaviors emerge based on different rewarding schemes. The research demonstrates that agents trained under competitive rewarding schemes learn to score efficiently, while those under collaborative schemes aim to keep the ball in play as long as possible. The study also examines the progression from competitive to collaborative behavior by adjusting the reward values for scoring.
The paper describes the development of a multiagent system where each agent is controlled by an independent DQN. The agents interact in a shared environment, and the study uses the Pong game as a testbed. The Pong game is chosen because it allows for the exploration of both competitive and cooperative strategies through different reward schemes. The study shows that by changing the rewarding schemes, agents can learn to either compete or collaborate.
The research also explores the evolution of behaviors in different reward scenarios. For instance, in a fully competitive setting, agents focus on scoring, while in a fully cooperative setting, agents aim to keep the ball in play. The study further investigates intermediate reward schemes that allow for a transition between competitive and cooperative behaviors. The results show that agents can adapt their strategies based on the reward structure, leading to different behaviors such as increased ball bounces, reduced wall bounces, and changes in serving time.
The paper also discusses the training procedure, including the use of exploration rates and the monitoring of Q-values to assess the performance of the agents. The study highlights the effectiveness of DQNs in decentralized learning for multiagent systems in complex environments. The results demonstrate that DQNs can be a practical tool for studying the learning of multiagent systems in complex environments. The study concludes that the use of DQNs in multiagent systems can lead to the emergence of cooperative and competitive strategies, depending on the reward structure.This paper explores the use of Deep Q-Networks (DQN) in multiagent systems, focusing on how two agents interact in the classic video game Pong. The study extends the DQN architecture proposed by Google DeepMind to multiagent environments and investigates how competitive and collaborative behaviors emerge based on different rewarding schemes. The research demonstrates that agents trained under competitive rewarding schemes learn to score efficiently, while those under collaborative schemes aim to keep the ball in play as long as possible. The study also examines the progression from competitive to collaborative behavior by adjusting the reward values for scoring.
The paper describes the development of a multiagent system where each agent is controlled by an independent DQN. The agents interact in a shared environment, and the study uses the Pong game as a testbed. The Pong game is chosen because it allows for the exploration of both competitive and cooperative strategies through different reward schemes. The study shows that by changing the rewarding schemes, agents can learn to either compete or collaborate.
The research also explores the evolution of behaviors in different reward scenarios. For instance, in a fully competitive setting, agents focus on scoring, while in a fully cooperative setting, agents aim to keep the ball in play. The study further investigates intermediate reward schemes that allow for a transition between competitive and cooperative behaviors. The results show that agents can adapt their strategies based on the reward structure, leading to different behaviors such as increased ball bounces, reduced wall bounces, and changes in serving time.
The paper also discusses the training procedure, including the use of exploration rates and the monitoring of Q-values to assess the performance of the agents. The study highlights the effectiveness of DQNs in decentralized learning for multiagent systems in complex environments. The results demonstrate that DQNs can be a practical tool for studying the learning of multiagent systems in complex environments. The study concludes that the use of DQNs in multiagent systems can lead to the emergence of cooperative and competitive strategies, depending on the reward structure.