19 Jan 2024 | Yinan Zheng, Jianxiong Li, Dongjie Yu, Yujie Yang, Shengbo Eben Li, Xianyuan Zhan, Jingjing Liu
Safe offline reinforcement learning (SOLRL) is a promising approach to ensure safety in policy learning without risky online interactions. However, existing methods often rely on soft constraints, which can lead to unsafe outcomes in critical applications. This paper introduces FISOR (Feasibility-guided Safe Offline RL), a novel method that enforces hard safety constraints by identifying the largest feasible region given the offline dataset. FISOR decomposes the problem into three decoupled processes: offline identification of the feasible region, optimal advantage learning, and policy extraction using a guided diffusion model. This approach ensures both strong safety performance and stability. Extensive evaluations on the DSRL benchmark show that FISOR guarantees safety satisfaction in all tasks while achieving high returns in most tasks, outperforming other baselines. The method also demonstrates versatility in safe offline imitation learning (IL).Safe offline reinforcement learning (SOLRL) is a promising approach to ensure safety in policy learning without risky online interactions. However, existing methods often rely on soft constraints, which can lead to unsafe outcomes in critical applications. This paper introduces FISOR (Feasibility-guided Safe Offline RL), a novel method that enforces hard safety constraints by identifying the largest feasible region given the offline dataset. FISOR decomposes the problem into three decoupled processes: offline identification of the feasible region, optimal advantage learning, and policy extraction using a guided diffusion model. This approach ensures both strong safety performance and stability. Extensive evaluations on the DSRL benchmark show that FISOR guarantees safety satisfaction in all tasks while achieving high returns in most tasks, outperforming other baselines. The method also demonstrates versatility in safe offline imitation learning (IL).