26 Nov 2019 | David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy P. Lillicrap, Greg Wayne
This paper introduces CLEAR, a replay-based method for reducing catastrophic forgetting in multi-task reinforcement learning. CLEAR combines on-policy learning from novel experiences (for plasticity) and off-policy learning from replay experiences (for stability), along with behavioral cloning between the current policy and its past self. CLEAR performs better than state-of-the-art methods for mitigating forgetting, despite being significantly less complicated and not requiring any knowledge of the individual tasks being learned.
The problem of catastrophic forgetting in neural networks has long been recognized, and it is known that rehearsing past data can be a satisfactory antidote for some purposes. In recent years, there has been renewed interest in overcoming catastrophic forgetting in RL and in supervised learning from streaming data. Current strategies for mitigating catastrophic forgetting have primarily focused on schemes for protecting the parameters inferred in one task while training on another.
Rehearsing old data via experience replay buffers is a common technique in RL, but such methods have largely been driven by the goal of data-efficient learning on single tasks. Research in this vein has included prioritized replay for maximizing the impact of rare experiences, learning from human demonstration data seeded into a buffer, and methods for approximating replay buffers with generative models.
The CLEAR method uses actor-critic training on a mixture of new and replayed experiences. It employs distributed training based on the Importance Weighted Actor-Learner Architecture. The V-Trace off-policy learning algorithm is used to correct for off-policy distribution shifts. The policy gradient loss is applied both for new and replay experiences, along with additional loss terms for replay experiences only.
The results show that CLEAR significantly reduces catastrophic forgetting, performing comparably to or better than state-of-the-art methods. CLEAR is effective even when memory is severely constrained. It achieves similar cumulative performance to networks trained on tasks separately and simultaneously, effectively eliminating catastrophic forgetting. CLEAR also demonstrates good performance in tasks where the probe task is introduced later in the training sequence.
CLEAR balances on- and off-policy learning, with a 50-50 split of new and replay data in training. It performs well even with small buffers, showing that reduced-size buffers still allow CLEAR to achieve essentially the same performance. CLEAR outperforms other methods such as Progress & Compress and Elastic Weight Consolidation, which require knowledge of task boundaries. CLEAR is a simple and effective method for preventing catastrophic forgetting in continual learning settings.This paper introduces CLEAR, a replay-based method for reducing catastrophic forgetting in multi-task reinforcement learning. CLEAR combines on-policy learning from novel experiences (for plasticity) and off-policy learning from replay experiences (for stability), along with behavioral cloning between the current policy and its past self. CLEAR performs better than state-of-the-art methods for mitigating forgetting, despite being significantly less complicated and not requiring any knowledge of the individual tasks being learned.
The problem of catastrophic forgetting in neural networks has long been recognized, and it is known that rehearsing past data can be a satisfactory antidote for some purposes. In recent years, there has been renewed interest in overcoming catastrophic forgetting in RL and in supervised learning from streaming data. Current strategies for mitigating catastrophic forgetting have primarily focused on schemes for protecting the parameters inferred in one task while training on another.
Rehearsing old data via experience replay buffers is a common technique in RL, but such methods have largely been driven by the goal of data-efficient learning on single tasks. Research in this vein has included prioritized replay for maximizing the impact of rare experiences, learning from human demonstration data seeded into a buffer, and methods for approximating replay buffers with generative models.
The CLEAR method uses actor-critic training on a mixture of new and replayed experiences. It employs distributed training based on the Importance Weighted Actor-Learner Architecture. The V-Trace off-policy learning algorithm is used to correct for off-policy distribution shifts. The policy gradient loss is applied both for new and replay experiences, along with additional loss terms for replay experiences only.
The results show that CLEAR significantly reduces catastrophic forgetting, performing comparably to or better than state-of-the-art methods. CLEAR is effective even when memory is severely constrained. It achieves similar cumulative performance to networks trained on tasks separately and simultaneously, effectively eliminating catastrophic forgetting. CLEAR also demonstrates good performance in tasks where the probe task is introduced later in the training sequence.
CLEAR balances on- and off-policy learning, with a 50-50 split of new and replay data in training. It performs well even with small buffers, showing that reduced-size buffers still allow CLEAR to achieve essentially the same performance. CLEAR outperforms other methods such as Progress & Compress and Elastic Weight Consolidation, which require knowledge of task boundaries. CLEAR is a simple and effective method for preventing catastrophic forgetting in continual learning settings.