Overcoming Catastrophic Forgetting with Hard Attention to the Task

Overcoming Catastrophic Forgetting with Hard Attention to the Task

29 May 2018 | Joan Serrà, Dídac Surís, Marius Miron, Alexandros Karatzoglou
This paper addresses the issue of catastrophic forgetting in neural networks, where a network loses previously learned information when trained on new tasks. The authors propose a task-based hard attention mechanism called Hard Attention to the Task (HAT) that preserves information from previous tasks without affecting the learning of new tasks. HAT learns almost-binary attention vectors through gated task embeddings, using stochastic gradient descent. These attention vectors are used to create masks that condition the updates of network weights, ensuring that a portion of the weights remains static while the rest adapts to new tasks. The approach is evaluated on image classification tasks using a high-standard evaluation protocol, showing a 45 to 80% reduction in catastrophic forgetting rates compared to existing methods. HAT is also shown to be robust to hyperparameter choices and offers monitoring capabilities, making it suitable for online learning and network compression applications.This paper addresses the issue of catastrophic forgetting in neural networks, where a network loses previously learned information when trained on new tasks. The authors propose a task-based hard attention mechanism called Hard Attention to the Task (HAT) that preserves information from previous tasks without affecting the learning of new tasks. HAT learns almost-binary attention vectors through gated task embeddings, using stochastic gradient descent. These attention vectors are used to create masks that condition the updates of network weights, ensuring that a portion of the weights remains static while the rest adapts to new tasks. The approach is evaluated on image classification tasks using a high-standard evaluation protocol, showing a 45 to 80% reduction in catastrophic forgetting rates compared to existing methods. HAT is also shown to be robust to hyperparameter choices and offers monitoring capabilities, making it suitable for online learning and network compression applications.
Reach us at info@study.space
[slides] Overcoming catastrophic forgetting with hard attention to the task | StudySpace