[slides and audio] Q-value Regularized Transformer for Offline Reinforcement Learning

The paper introduces the Q-value Regularized Transformer (QT), a novel approach for offline reinforcement learning (RL) that combines the trajectory modeling capabilities of the Transformer with the predictability of optimal future returns from Dynamic Programming (DP) methods. QT aims to address the challenge of stitching together optimal trajectories from sub-optimal ones, a common issue in Conditional Sequence Modeling (CSM) approaches. By learning an action-value function and integrating a term that maximizes action-values into the training loss of CSM, QT seeks to align optimal actions closely with the behavior policy. Empirical evaluations on D4RL benchmark datasets demonstrate that QT outperforms traditional DP and CSM methods, highlighting its potential to enhance the state-of-the-art in offline RL. The paper also includes a detailed methodology, experimental results, and related work, providing a comprehensive overview of the proposed approach.The paper introduces the Q-value Regularized Transformer (QT), a novel approach for offline reinforcement learning (RL) that combines the trajectory modeling capabilities of the Transformer with the predictability of optimal future returns from Dynamic Programming (DP) methods. QT aims to address the challenge of stitching together optimal trajectories from sub-optimal ones, a common issue in Conditional Sequence Modeling (CSM) approaches. By learning an action-value function and integrating a term that maximizes action-values into the training loss of CSM, QT seeks to align optimal actions closely with the behavior policy. Empirical evaluations on D4RL benchmark datasets demonstrate that QT outperforms traditional DP and CSM methods, highlighting its potential to enhance the state-of-the-art in offline RL. The paper also includes a detailed methodology, experimental results, and related work, providing a comprehensive overview of the proposed approach.

Q-value Regularized Transformer for Offline Reinforcement Learning

2024 | Shengchao Hu, Ziqing Fan, Chaoqin Huang, Li Shen, Ya Zhang, Yanfeng Wang, Dacheng Tao