An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks

An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks

4 Mar 2015 | Ian J. Goodfellow, Mehdi Mirza, Da Xiao, Aaron Courville, Yoshua Bengio
This paper investigates catastrophic forgetting in gradient-based neural networks. The authors find that dropout is consistently the best training algorithm for modern feedforward neural networks, as it performs well in adapting to new tasks while retaining performance on old tasks. They also find that the choice of activation function significantly affects performance, and that different tasks and task relationships result in different rankings of activation function performance. Therefore, the choice of activation function should always be cross-validated. The study compares various learning algorithms and activation functions, and finds that dropout is the best at minimizing catastrophic forgetting. The authors also find that the relationship between tasks strongly influences the outcome of learning. They examine three types of task relationships: functionally identical tasks with different input formats, similar tasks, and dissimilar tasks. The study finds that dropout increases the optimal size of the network, which may explain its resistance to forgetting. However, this effect is not consistent, and when using dissimilar task pairs, dropout usually decreases the size of the network. This suggests that dropout may have other beneficial effects that need to be explored further. The authors also find that the choice of activation function has a less consistent effect, and that different activation functions are preferable depending on the task and the relationship between tasks. Maxout is the only activation function that consistently appears on the frontier of performance tradeoffs for all tasks considered. However, maxout is not the best function at all points along the tradeoff curve, and does not have as consistent performance when trained without dropout. The study also finds that the choice of activation function should always be cross-validated, as the best function can vary depending on the task and the relationship between tasks. The authors conclude that dropout is the best training algorithm for modern neural networks, and that the choice of activation function should be cross-validated.This paper investigates catastrophic forgetting in gradient-based neural networks. The authors find that dropout is consistently the best training algorithm for modern feedforward neural networks, as it performs well in adapting to new tasks while retaining performance on old tasks. They also find that the choice of activation function significantly affects performance, and that different tasks and task relationships result in different rankings of activation function performance. Therefore, the choice of activation function should always be cross-validated. The study compares various learning algorithms and activation functions, and finds that dropout is the best at minimizing catastrophic forgetting. The authors also find that the relationship between tasks strongly influences the outcome of learning. They examine three types of task relationships: functionally identical tasks with different input formats, similar tasks, and dissimilar tasks. The study finds that dropout increases the optimal size of the network, which may explain its resistance to forgetting. However, this effect is not consistent, and when using dissimilar task pairs, dropout usually decreases the size of the network. This suggests that dropout may have other beneficial effects that need to be explored further. The authors also find that the choice of activation function has a less consistent effect, and that different activation functions are preferable depending on the task and the relationship between tasks. Maxout is the only activation function that consistently appears on the frontier of performance tradeoffs for all tasks considered. However, maxout is not the best function at all points along the tradeoff curve, and does not have as consistent performance when trained without dropout. The study also finds that the choice of activation function should always be cross-validated, as the best function can vary depending on the task and the relationship between tasks. The authors conclude that dropout is the best training algorithm for modern neural networks, and that the choice of activation function should be cross-validated.
Reach us at info@study.space