This paper introduces Utility-based Perturbed Gradient Descent (UPGD) as a novel approach for continual learning, addressing both catastrophic forgetting and loss of plasticity. UPGD combines gradient updates with perturbations, applying smaller modifications to more useful units to protect them from forgetting and larger modifications to less useful units to rejuvenate their plasticity. The method is evaluated in a challenging streaming learning setup with hundreds of non-stationarities and unknown task boundaries. Results show that UPGD continues to improve performance and surpasses or is competitive with all methods in all problems. In extended reinforcement learning experiments with PPO, UPGD avoids performance drops by addressing both continual learning issues.
Continual learning remains a significant challenge for artificial intelligence, with catastrophic forgetting and loss of plasticity being major issues. Catastrophic forgetting occurs when neural networks fail to retain or leverage past knowledge due to forgetting or overwriting previously learned units. Loss of plasticity refers to the learner's diminished ability to learn new things. Existing methods often address these issues separately, but few tackle both simultaneously.
UPGD addresses both issues by using a utility measure to guide gradient updates. The utility of a weight is defined as the change in loss when the weight is set to zero. This utility is approximated using a second-order Taylor expansion, allowing for efficient computation. UPGD uses this utility measure to protect useful weights and perturb less useful ones, maintaining plasticity and reducing forgetting.
The paper evaluates UPGD on several tasks, including Input-Permuted MNIST, Label-Permuted CIFAR-10, and Label-Permuted EMNIST. Results show that UPGD maintains network plasticity and reuses previously learned useful features, outperforming other methods in terms of performance and plasticity. In reinforcement learning experiments, UPGD prevents policy collapse in PPO, demonstrating its effectiveness in addressing both catastrophic forgetting and loss of plasticity. The method is scalable and efficient, making it suitable for a wide range of continual learning tasks.This paper introduces Utility-based Perturbed Gradient Descent (UPGD) as a novel approach for continual learning, addressing both catastrophic forgetting and loss of plasticity. UPGD combines gradient updates with perturbations, applying smaller modifications to more useful units to protect them from forgetting and larger modifications to less useful units to rejuvenate their plasticity. The method is evaluated in a challenging streaming learning setup with hundreds of non-stationarities and unknown task boundaries. Results show that UPGD continues to improve performance and surpasses or is competitive with all methods in all problems. In extended reinforcement learning experiments with PPO, UPGD avoids performance drops by addressing both continual learning issues.
Continual learning remains a significant challenge for artificial intelligence, with catastrophic forgetting and loss of plasticity being major issues. Catastrophic forgetting occurs when neural networks fail to retain or leverage past knowledge due to forgetting or overwriting previously learned units. Loss of plasticity refers to the learner's diminished ability to learn new things. Existing methods often address these issues separately, but few tackle both simultaneously.
UPGD addresses both issues by using a utility measure to guide gradient updates. The utility of a weight is defined as the change in loss when the weight is set to zero. This utility is approximated using a second-order Taylor expansion, allowing for efficient computation. UPGD uses this utility measure to protect useful weights and perturb less useful ones, maintaining plasticity and reducing forgetting.
The paper evaluates UPGD on several tasks, including Input-Permuted MNIST, Label-Permuted CIFAR-10, and Label-Permuted EMNIST. Results show that UPGD maintains network plasticity and reuses previously learned useful features, outperforming other methods in terms of performance and plasticity. In reinforcement learning experiments, UPGD prevents policy collapse in PPO, demonstrating its effectiveness in addressing both catastrophic forgetting and loss of plasticity. The method is scalable and efficient, making it suitable for a wide range of continual learning tasks.