2024 | Maciej Wołczyk, Bartłomiej Cupiał, Mateusz Ostashewski, Michał Bortkiewicz, Michał Zając, Razvan Pascanu, Łukasz Kuciński, Piotr Miłoś
Fine-tuning reinforcement learning (RL) models is a critical challenge, as it often leads to the loss of pre-trained capabilities, a phenomenon termed "forgetting of pre-trained capabilities" (FPC). This issue arises due to the interplay between actions and observations in RL, causing models to deteriorate in performance on state subspaces not visited during the initial fine-tuning phase. The paper identifies two key instances of FPC: state coverage gap and imperfect cloning gap. In the state coverage gap, a pre-trained policy performs well on distant states but struggles on close states during fine-tuning. In the imperfect cloning gap, the pre-trained policy is effective on both close and distant states, but fine-tuning leads to a loss of performance on distant states due to interference.
The study demonstrates that standard knowledge retention techniques, such as Elastic Weight Consolidation (EWC), behavioral cloning (BC), kickstarting (KS), and episodic memory (EM), effectively mitigate FPC. These methods allow models to retain pre-trained capabilities during fine-tuning, leading to significant improvements in performance. The paper evaluates these techniques on three environments: NetHack, Montezuma's Revenge, and RoboticSequence. In NetHack, fine-tuning with knowledge retention techniques improves performance by 2x, achieving over 10K points compared to the previous best of 5K. In Montezuma's Revenge, BC and EWC outperform vanilla fine-tuning, while in RoboticSequence, BC is the most effective method.
The paper highlights that forgetting of pre-trained capabilities is a significant issue in RL, as it can lead to substantial performance deterioration. The findings suggest that knowledge retention techniques are essential for effective fine-tuning of pre-trained RL models. The study also discusses the broader implications of these findings, noting that the principles of knowledge retention and forgetting explored in this work could be relevant beyond RL, potentially impacting a wide range of learning systems that evolve over time.Fine-tuning reinforcement learning (RL) models is a critical challenge, as it often leads to the loss of pre-trained capabilities, a phenomenon termed "forgetting of pre-trained capabilities" (FPC). This issue arises due to the interplay between actions and observations in RL, causing models to deteriorate in performance on state subspaces not visited during the initial fine-tuning phase. The paper identifies two key instances of FPC: state coverage gap and imperfect cloning gap. In the state coverage gap, a pre-trained policy performs well on distant states but struggles on close states during fine-tuning. In the imperfect cloning gap, the pre-trained policy is effective on both close and distant states, but fine-tuning leads to a loss of performance on distant states due to interference.
The study demonstrates that standard knowledge retention techniques, such as Elastic Weight Consolidation (EWC), behavioral cloning (BC), kickstarting (KS), and episodic memory (EM), effectively mitigate FPC. These methods allow models to retain pre-trained capabilities during fine-tuning, leading to significant improvements in performance. The paper evaluates these techniques on three environments: NetHack, Montezuma's Revenge, and RoboticSequence. In NetHack, fine-tuning with knowledge retention techniques improves performance by 2x, achieving over 10K points compared to the previous best of 5K. In Montezuma's Revenge, BC and EWC outperform vanilla fine-tuning, while in RoboticSequence, BC is the most effective method.
The paper highlights that forgetting of pre-trained capabilities is a significant issue in RL, as it can lead to substantial performance deterioration. The findings suggest that knowledge retention techniques are essential for effective fine-tuning of pre-trained RL models. The study also discusses the broader implications of these findings, noting that the principles of knowledge retention and forgetting explored in this work could be relevant beyond RL, potentially impacting a wide range of learning systems that evolve over time.