Overcoming catastrophic forgetting in neural networks

Overcoming catastrophic forgetting in neural networks

25 Jan 2017 | James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell
This paper presents Elastic Weight Consolidation (EWC), a novel algorithm that enables neural networks to learn multiple tasks sequentially without catastrophic forgetting. The key idea is to protect the weights that are important for previously learned tasks by slowing down their learning. This approach is inspired by neurobiological mechanisms of synaptic consolidation, where the brain maintains previously acquired knowledge even after learning new tasks. EWC is implemented as a quadratic constraint that pulls weights back towards their previous values based on their importance for past tasks. This allows the network to learn new tasks while retaining performance on older ones. The algorithm is demonstrated on both supervised learning tasks (e.g., MNIST digit classification) and reinforcement learning tasks (e.g., Atari 2600 games). In supervised learning, EWC outperforms traditional methods like L2 regularization and plain SGD, maintaining performance on old tasks while learning new ones. In reinforcement learning, EWC enables a single network to learn multiple games sequentially, with only modest increases in error rates. The algorithm is grounded in Bayesian principles, where the posterior distribution of parameters is approximated using the Fisher information matrix. This matrix captures the uncertainty of parameters and is used to determine which weights are most important for previous tasks. EWC is shown to be effective in both small and large networks, with computational efficiency achieved through approximations of the posterior distribution. The paper also discusses the implications of EWC for understanding neurobiological mechanisms of learning and memory. It highlights parallels between EWC and computational theories of synaptic plasticity, suggesting that weight uncertainty should inform learning rates. The results demonstrate that EWC can be combined with deep neural networks to achieve successful performance in challenging domains, providing evidence that neurobiological theories of synaptic consolidation scale to large-scale learning systems.This paper presents Elastic Weight Consolidation (EWC), a novel algorithm that enables neural networks to learn multiple tasks sequentially without catastrophic forgetting. The key idea is to protect the weights that are important for previously learned tasks by slowing down their learning. This approach is inspired by neurobiological mechanisms of synaptic consolidation, where the brain maintains previously acquired knowledge even after learning new tasks. EWC is implemented as a quadratic constraint that pulls weights back towards their previous values based on their importance for past tasks. This allows the network to learn new tasks while retaining performance on older ones. The algorithm is demonstrated on both supervised learning tasks (e.g., MNIST digit classification) and reinforcement learning tasks (e.g., Atari 2600 games). In supervised learning, EWC outperforms traditional methods like L2 regularization and plain SGD, maintaining performance on old tasks while learning new ones. In reinforcement learning, EWC enables a single network to learn multiple games sequentially, with only modest increases in error rates. The algorithm is grounded in Bayesian principles, where the posterior distribution of parameters is approximated using the Fisher information matrix. This matrix captures the uncertainty of parameters and is used to determine which weights are most important for previous tasks. EWC is shown to be effective in both small and large networks, with computational efficiency achieved through approximations of the posterior distribution. The paper also discusses the implications of EWC for understanding neurobiological mechanisms of learning and memory. It highlights parallels between EWC and computational theories of synaptic plasticity, suggesting that weight uncertainty should inform learning rates. The results demonstrate that EWC can be combined with deep neural networks to achieve successful performance in challenging domains, providing evidence that neurobiological theories of synaptic consolidation scale to large-scale learning systems.
Reach us at info@study.space