Decision Transformer: Reinforcement Learning via Sequence Modeling

Decision Transformer: Reinforcement Learning via Sequence Modeling

24 Jun 2021 | Lili Chen*,†, Kevin Lu*,†, Aravind Rajeswaran†, Kimin Lee†, Aditya Grover†, Michael Laskin†, Pieter Abbeel†, Aravind Srinivas†, Igor Mordatch†, 3
Decision Transformer is a novel approach that frames reinforcement learning (RL) as a sequence modeling problem, leveraging the Transformer architecture and advances in language modeling. It models the problem of RL as conditional sequence modeling, where the model predicts optimal actions by conditioning on desired returns, past states, and actions. Unlike traditional RL methods that focus on value functions or policy gradients, Decision Transformer uses a causally masked Transformer to generate future actions that achieve the desired return. This approach allows the model to bypass the need for bootstrapping in long-term credit assignment, avoiding one of the "deadly triads" in RL. It also avoids the need for discounting future rewards, which can lead to short-sighted behaviors. Decision Transformer has shown competitive or superior performance compared to state-of-the-art model-free offline RL baselines on tasks such as Atari, OpenAI Gym, and Key-to-Door. The architecture of Decision Transformer involves feeding states, actions, and returns into modality-specific linear embeddings and adding positional episodic timestep encoding. Tokens are then processed by a GPT architecture to predict actions autoregressively. The model is trained on sequences of states, actions, and returns, enabling it to generate optimal trajectories by conditioning on desired returns. The model's ability to model long sequences and perform credit assignment directly via self-attention makes it effective in sparse or distracting reward settings. Decision Transformer has shown strong performance in tasks requiring long-term credit assignment and in sparse reward settings, where traditional RL methods often struggle. The paper evaluates Decision Transformer on various offline RL benchmarks, including Atari, OpenAI Gym, and Key-to-Door. It demonstrates that Decision Transformer can match or exceed the performance of state-of-the-art model-free offline RL algorithms without the need for dynamic programming. The model is also effective in scenarios with limited data, outperforming methods like Percentile Behavior Cloning. Decision Transformer's ability to model the distribution of returns and perform effective long-term credit assignment makes it a promising approach for RL. The paper also discusses the benefits of using longer context lengths and the potential of sequence modeling in RL, suggesting that it could serve as a strong algorithmic paradigm for future RL research.Decision Transformer is a novel approach that frames reinforcement learning (RL) as a sequence modeling problem, leveraging the Transformer architecture and advances in language modeling. It models the problem of RL as conditional sequence modeling, where the model predicts optimal actions by conditioning on desired returns, past states, and actions. Unlike traditional RL methods that focus on value functions or policy gradients, Decision Transformer uses a causally masked Transformer to generate future actions that achieve the desired return. This approach allows the model to bypass the need for bootstrapping in long-term credit assignment, avoiding one of the "deadly triads" in RL. It also avoids the need for discounting future rewards, which can lead to short-sighted behaviors. Decision Transformer has shown competitive or superior performance compared to state-of-the-art model-free offline RL baselines on tasks such as Atari, OpenAI Gym, and Key-to-Door. The architecture of Decision Transformer involves feeding states, actions, and returns into modality-specific linear embeddings and adding positional episodic timestep encoding. Tokens are then processed by a GPT architecture to predict actions autoregressively. The model is trained on sequences of states, actions, and returns, enabling it to generate optimal trajectories by conditioning on desired returns. The model's ability to model long sequences and perform credit assignment directly via self-attention makes it effective in sparse or distracting reward settings. Decision Transformer has shown strong performance in tasks requiring long-term credit assignment and in sparse reward settings, where traditional RL methods often struggle. The paper evaluates Decision Transformer on various offline RL benchmarks, including Atari, OpenAI Gym, and Key-to-Door. It demonstrates that Decision Transformer can match or exceed the performance of state-of-the-art model-free offline RL algorithms without the need for dynamic programming. The model is also effective in scenarios with limited data, outperforming methods like Percentile Behavior Cloning. Decision Transformer's ability to model the distribution of returns and perform effective long-term credit assignment makes it a promising approach for RL. The paper also discusses the benefits of using longer context lengths and the potential of sequence modeling in RL, suggesting that it could serve as a strong algorithmic paradigm for future RL research.
Reach us at info@study.space
Understanding Decision Transformer%3A Reinforcement Learning via Sequence Modeling