Understanding An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks

This paper investigates the phenomenon of catastrophic forgetting in gradient-based neural networks, where models trained on one task forget how to perform the first task when trained on a new task. The authors compare various gradient-based training algorithms and activation functions, focusing on the impact of the relationship between the first and second tasks. They find that dropout is consistently the best algorithm for adapting to new tasks while retaining performance on the old task, with a better trade-off curve between these extremes. The choice of activation function also varies significantly depending on the task and its relationship, suggesting that cross-validation is necessary. The study examines three types of task relationships: functionally identical tasks with different input formats, similar tasks, and dissimilar tasks. Dropout is shown to increase the optimal model size, possibly due to its larger capacity, but this effect is not consistent across all task pairs. The paper concludes by recommending the use of maxout activation function trained with dropout for computational constraints.This paper investigates the phenomenon of catastrophic forgetting in gradient-based neural networks, where models trained on one task forget how to perform the first task when trained on a new task. The authors compare various gradient-based training algorithms and activation functions, focusing on the impact of the relationship between the first and second tasks. They find that dropout is consistently the best algorithm for adapting to new tasks while retaining performance on the old task, with a better trade-off curve between these extremes. The choice of activation function also varies significantly depending on the task and its relationship, suggesting that cross-validation is necessary. The study examines three types of task relationships: functionally identical tasks with different input formats, similar tasks, and dissimilar tasks. Dropout is shown to increase the optimal model size, possibly due to its larger capacity, but this effect is not consistent across all task pairs. The paper concludes by recommending the use of maxout activation function trained with dropout for computational constraints.

An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks

4 Mar 2015 | Ian J. Goodfellow, Mehdi Mirza, Da Xiao, Aaron Courville, Yoshua Bengio