Understanding Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem

This paper explores the challenges and solutions for effectively transferring knowledge from pre-trained models to downstream tasks in reinforcement learning (RL). The authors identify a critical issue known as *forgetting of pre-trained capabilities* (FPC), which occurs when a model trained on pre-training data loses its ability to perform well on parts of the downstream task that were not visited during the initial training phase. This problem is particularly pronounced in RL due to the interplay between actions and observations, leading to two specific instances: *state coverage gap* and *imperfect cloning gap*. The authors demonstrate that standard knowledge retention techniques, such as Elastic Weight Consolidation (EWC), behavioral cloning (BC), kickstarting (KS), and episodic memory (EM), can mitigate FPC and improve performance on various environments, including NetHack, Montezuma's Revenge, and robotic tasks. They show that these techniques enable the model to leverage pre-trained capabilities more effectively, achieving significant improvements in state-of-the-art results. The study highlights the importance of addressing FPC in RL fine-tuning to enhance the transfer capabilities of pre-trained models.This paper explores the challenges and solutions for effectively transferring knowledge from pre-trained models to downstream tasks in reinforcement learning (RL). The authors identify a critical issue known as *forgetting of pre-trained capabilities* (FPC), which occurs when a model trained on pre-training data loses its ability to perform well on parts of the downstream task that were not visited during the initial training phase. This problem is particularly pronounced in RL due to the interplay between actions and observations, leading to two specific instances: *state coverage gap* and *imperfect cloning gap*. The authors demonstrate that standard knowledge retention techniques, such as Elastic Weight Consolidation (EWC), behavioral cloning (BC), kickstarting (KS), and episodic memory (EM), can mitigate FPC and improve performance on various environments, including NetHack, Montezuma's Revenge, and robotic tasks. They show that these techniques enable the model to leverage pre-trained capabilities more effectively, achieving significant improvements in state-of-the-art results. The study highlights the importance of addressing FPC in RL fine-tuning to enhance the transfer capabilities of pre-trained models.

Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem

2024 | Maciej Wolczyk * 1 Bartłomiej Cupial * 1 2 Mateusz Ostaszewski 3 Michal Bortkiewicz 3 Michal Zając 4 Razvan Pascanu 5 Łukasz Kuciński 1 2 6 Piotr Miłoś 1 2 6 7