26 Aug 2024 | Hai Nguyen, Andrea Baisero, David Klee, Dian Wang, Robert Platt, Christopher Amato
This paper introduces equivariant reinforcement learning (RL) for partially observable Markov decision processes (POMDPs), leveraging symmetry to improve sample efficiency and performance in robotic tasks. The authors propose a framework for group-invariant POMDPs, where the optimal policy and value function must be equivariant and invariant, respectively. They develop equivariant actor-critic agents that inherently embed domain symmetry into their architectures, enabling them to reuse solutions across related scenarios. These agents outperform non-equivariant approaches in both sample efficiency and final performance, as demonstrated through experiments on simulated and real-world robotic tasks.
The key contributions include extending group-invariant MDPs to POMDPs, introducing equivariant actor-critic agents that embed domain symmetry, and applying these agents to realistic robot manipulation tasks with sparse rewards. The agents use equivariant modules, including equivariant CNNs and recurrent neural networks, to maintain equivariance under group transformations. The framework is tested on various domains, including grid-world and robotic manipulation tasks, showing significant improvements in performance.
The paper also compares the proposed method with other baselines, including data augmentation techniques and model-based methods, demonstrating that equivariant agents achieve better results in terms of sample efficiency and performance. The results show that equivariant agents can handle domains with imperfect symmetry and generalize well to real-world scenarios, including zero-shot transfers to real hardware. The authors conclude that equivariant RL is a promising approach for tackling challenging robot learning domains with sample-efficient solutions.This paper introduces equivariant reinforcement learning (RL) for partially observable Markov decision processes (POMDPs), leveraging symmetry to improve sample efficiency and performance in robotic tasks. The authors propose a framework for group-invariant POMDPs, where the optimal policy and value function must be equivariant and invariant, respectively. They develop equivariant actor-critic agents that inherently embed domain symmetry into their architectures, enabling them to reuse solutions across related scenarios. These agents outperform non-equivariant approaches in both sample efficiency and final performance, as demonstrated through experiments on simulated and real-world robotic tasks.
The key contributions include extending group-invariant MDPs to POMDPs, introducing equivariant actor-critic agents that embed domain symmetry, and applying these agents to realistic robot manipulation tasks with sparse rewards. The agents use equivariant modules, including equivariant CNNs and recurrent neural networks, to maintain equivariance under group transformations. The framework is tested on various domains, including grid-world and robotic manipulation tasks, showing significant improvements in performance.
The paper also compares the proposed method with other baselines, including data augmentation techniques and model-based methods, demonstrating that equivariant agents achieve better results in terms of sample efficiency and performance. The results show that equivariant agents can handle domains with imperfect symmetry and generalize well to real-world scenarios, including zero-shot transfers to real hardware. The authors conclude that equivariant RL is a promising approach for tackling challenging robot learning domains with sample-efficient solutions.