Understanding Equivariant Reinforcement Learning under Partial Observability

This paper addresses the challenge of improving sample efficiency in robot learning by incorporating inductive biases, specifically focusing on symmetries in partially observable domains. The authors extend the framework of group-invariant Markov decision processes (MDPs) to partially observable Markov decision processes (POMDPs), proving that optimal policies and value functions must be equivariant and invariant under certain conditions. They introduce equivariant actor-critic agents that embed domain symmetries in their architectures, demonstrating significant improvements in sample efficiency and final performance compared to non-equivariant approaches. The effectiveness of these agents is demonstrated through experiments on realistic robotic manipulation tasks with sparse rewards, both in simulation and real hardware. The paper also discusses the limitations of equivariant approaches, such as the requirement of imperfect symmetry, and provides insights into how these methods can still perform well even under such conditions.This paper addresses the challenge of improving sample efficiency in robot learning by incorporating inductive biases, specifically focusing on symmetries in partially observable domains. The authors extend the framework of group-invariant Markov decision processes (MDPs) to partially observable Markov decision processes (POMDPs), proving that optimal policies and value functions must be equivariant and invariant under certain conditions. They introduce equivariant actor-critic agents that embed domain symmetries in their architectures, demonstrating significant improvements in sample efficiency and final performance compared to non-equivariant approaches. The effectiveness of these agents is demonstrated through experiments on realistic robotic manipulation tasks with sparse rewards, both in simulation and real hardware. The paper also discusses the limitations of equivariant approaches, such as the requirement of imperfect symmetry, and provides insights into how these methods can still perform well even under such conditions.

Equivariant Reinforcement Learning under Partial Observability

26 Aug 2024 | Hai Nguyen, Andrea Baisero, David Klee, Dian Wang, Robert Platt, Christopher Amato