22 Oct 2022 | Andrei A. Rusu*, Neil C. Rabinowitz*, Guillaume Desjardins*, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, Raia Hadsell
Progressive neural networks are a novel architecture designed to enable transfer learning across sequences of tasks while avoiding catastrophic forgetting. Unlike traditional finetuning methods, which discard previously learned knowledge, progressive networks maintain a pool of pretrained models and use lateral connections to retain and reuse prior knowledge. This allows them to achieve richer compositionality, where prior knowledge is integrated at each layer of the feature hierarchy. The architecture is particularly effective in reinforcement learning (RL) tasks, where it outperforms baselines in transfer performance without the destructive consequences of finetuning.
The paper introduces progressive networks, which are designed to handle continual learning by maintaining a set of pretrained models and using lateral connections to transfer knowledge between tasks. Each new task is handled by a new column in the network, which is initialized randomly and connected to previously learned features. This structure prevents catastrophic forgetting and allows the network to accumulate experiences over time. The architecture is evaluated on a variety of RL tasks, including Atari games and 3D maze environments, where it demonstrates superior performance compared to traditional finetuning approaches.
The paper also presents a detailed analysis of transfer using two methods: Average Perturbation Sensitivity (APS) and Average Fisher Sensitivity (AFS). These methods reveal that transfer occurs at both low-level sensory and high-level control layers of the learned policy. The results show that progressive networks can effectively transfer knowledge between tasks, even when the tasks are conceptually or visually distinct.
The paper also discusses the limitations of progressive networks, including the growth in model size with the number of tasks. However, it suggests that this growth can be mitigated through techniques such as pruning or online compression. The architecture is particularly effective in RL settings, where it can learn and adapt to new tasks without forgetting previously learned knowledge.
Overall, progressive networks represent a significant advancement in the field of continual learning, offering a robust and effective solution for transfer learning across sequences of tasks. The architecture is particularly well-suited for RL applications, where it can learn and adapt to new tasks while maintaining the ability to transfer knowledge from previous experiences.Progressive neural networks are a novel architecture designed to enable transfer learning across sequences of tasks while avoiding catastrophic forgetting. Unlike traditional finetuning methods, which discard previously learned knowledge, progressive networks maintain a pool of pretrained models and use lateral connections to retain and reuse prior knowledge. This allows them to achieve richer compositionality, where prior knowledge is integrated at each layer of the feature hierarchy. The architecture is particularly effective in reinforcement learning (RL) tasks, where it outperforms baselines in transfer performance without the destructive consequences of finetuning.
The paper introduces progressive networks, which are designed to handle continual learning by maintaining a set of pretrained models and using lateral connections to transfer knowledge between tasks. Each new task is handled by a new column in the network, which is initialized randomly and connected to previously learned features. This structure prevents catastrophic forgetting and allows the network to accumulate experiences over time. The architecture is evaluated on a variety of RL tasks, including Atari games and 3D maze environments, where it demonstrates superior performance compared to traditional finetuning approaches.
The paper also presents a detailed analysis of transfer using two methods: Average Perturbation Sensitivity (APS) and Average Fisher Sensitivity (AFS). These methods reveal that transfer occurs at both low-level sensory and high-level control layers of the learned policy. The results show that progressive networks can effectively transfer knowledge between tasks, even when the tasks are conceptually or visually distinct.
The paper also discusses the limitations of progressive networks, including the growth in model size with the number of tasks. However, it suggests that this growth can be mitigated through techniques such as pruning or online compression. The architecture is particularly effective in RL settings, where it can learn and adapt to new tasks without forgetting previously learned knowledge.
Overall, progressive networks represent a significant advancement in the field of continual learning, offering a robust and effective solution for transfer learning across sequences of tasks. The architecture is particularly well-suited for RL applications, where it can learn and adapt to new tasks while maintaining the ability to transfer knowledge from previous experiences.