19 Feb 2024 | Avinandan Bose, Simon Shaolei Du, Maryam Fazel
This paper presents an algorithm for offline multi-task transfer reinforcement learning (MTRL) with representational penalization. The goal is to learn a shared representation across multiple source tasks to improve performance on a target task. Unlike online RL, where the agent interacts with the environment to learn a policy, offline RL relies on pre-collected data, which can lead to incomplete coverage of state-action spaces. The proposed algorithm computes pointwise uncertainty measures for the learned representation and establishes a data-dependent upper bound on the suboptimality of the target policy. By leveraging the collective exploration of source tasks, the algorithm mitigates poor coverage in some regions, allowing for more effective transfer. Theoretical analysis and empirical evaluation on a rich observation MDP demonstrate the benefits of penalizing and quantifying uncertainty in the learned representation. The algorithm outperforms baselines in scenarios with limited data, showing the importance of uncertainty quantification in offline MTRL. The results highlight the effectiveness of the proposed method in achieving near-optimal policies for the target task.This paper presents an algorithm for offline multi-task transfer reinforcement learning (MTRL) with representational penalization. The goal is to learn a shared representation across multiple source tasks to improve performance on a target task. Unlike online RL, where the agent interacts with the environment to learn a policy, offline RL relies on pre-collected data, which can lead to incomplete coverage of state-action spaces. The proposed algorithm computes pointwise uncertainty measures for the learned representation and establishes a data-dependent upper bound on the suboptimality of the target policy. By leveraging the collective exploration of source tasks, the algorithm mitigates poor coverage in some regions, allowing for more effective transfer. Theoretical analysis and empirical evaluation on a rich observation MDP demonstrate the benefits of penalizing and quantifying uncertainty in the learned representation. The algorithm outperforms baselines in scenarios with limited data, showing the importance of uncertainty quantification in offline MTRL. The results highlight the effectiveness of the proposed method in achieving near-optimal policies for the target task.