[slides and audio] Deep Recurrent Q-Learning for Partially Observable MDPs

The paper "Deep Recurrent Q-Learning for Partially Observable MDPs" by Matthew Hausknecht and Peter Stone explores the use of recurrent neural networks to enhance Deep Q-Networks (DQNs) in partially observable Markov Decision Processes (POMDPs). The authors introduce the Deep Recurrent Q-Network (DRQN), which replaces the first fully connected layer of a DQN with a recurrent LSTM layer. This modification allows DRQN to integrate information across frames, even when only a single frame is available at each time step. The DRQN is tested on standard Atari games and POMDPs featuring flickering game screens, demonstrating its ability to handle partial observability and adapt to changes in observation quality. Key findings include: 1. **Performance on Standard Atari Games**: DRQN performs similarly to DQN on standard Atari games, with some exceptions where it outperforms or struggles. 2. **Flickering Atari Games**: DRQN shows superior performance in games like Pong, where partial observability is induced by flickering screens, by integrating information across frames. 3. **Generalization from MDP to POMDP**: DRQN generalizes better from fully observable games to partially observable games compared to DQN, maintaining or improving performance as observability decreases. 4. **Stable Recurrent Updates**: The paper discusses two types of updates—bootstrapped sequential and bootstrapped random—and finds that both are viable, with the randomized update strategy being simpler and more effective. 5. **Architecture and Computational Efficiency**: The DRQN architecture is described, and computational efficiency experiments show that DRQN scales sub-linearly with the number of stacked frames and unrolled iterations. The authors conclude that while recurrency is a viable method for handling state observations, it does not provide a systematic advantage over stacking observations in the input layer of a convolutional network. Future work could focus on identifying characteristics of specific games that benefit from recurrency.The paper "Deep Recurrent Q-Learning for Partially Observable MDPs" by Matthew Hausknecht and Peter Stone explores the use of recurrent neural networks to enhance Deep Q-Networks (DQNs) in partially observable Markov Decision Processes (POMDPs). The authors introduce the Deep Recurrent Q-Network (DRQN), which replaces the first fully connected layer of a DQN with a recurrent LSTM layer. This modification allows DRQN to integrate information across frames, even when only a single frame is available at each time step. The DRQN is tested on standard Atari games and POMDPs featuring flickering game screens, demonstrating its ability to handle partial observability and adapt to changes in observation quality. Key findings include: 1. **Performance on Standard Atari Games**: DRQN performs similarly to DQN on standard Atari games, with some exceptions where it outperforms or struggles. 2. **Flickering Atari Games**: DRQN shows superior performance in games like Pong, where partial observability is induced by flickering screens, by integrating information across frames. 3. **Generalization from MDP to POMDP**: DRQN generalizes better from fully observable games to partially observable games compared to DQN, maintaining or improving performance as observability decreases. 4. **Stable Recurrent Updates**: The paper discusses two types of updates—bootstrapped sequential and bootstrapped random—and finds that both are viable, with the randomized update strategy being simpler and more effective. 5. **Architecture and Computational Efficiency**: The DRQN architecture is described, and computational efficiency experiments show that DRQN scales sub-linearly with the number of stacked frames and unrolled iterations. The authors conclude that while recurrency is a viable method for handling state observations, it does not provide a systematic advantage over stacking observations in the input layer of a convolutional network. Future work could focus on identifying characteristics of specific games that benefit from recurrency.

Deep Recurrent Q-Learning for Partially Observable MDPs

11 Jan 2017 | Matthew Hausknecht and Peter Stone