Understanding Skills Regularized Task Decomposition for Multi-task Offline Reinforcement Learning

This paper presents a novel multi-task offline reinforcement learning (RL) model that addresses the challenge of heterogeneous datasets of varying quality. The model employs a skill-based approach to decompose tasks into achievable subtasks and uses quality-aware regularization to ensure robust performance. Key contributions include: 1. **Skill-regularized Task Decomposition**: The model jointly learns skill and task embeddings in a latent space, where common skills are used to guide the decomposition of tasks into subtasks. This decomposition is achieved through a Wasserstein auto-encoder (WAE) and a quality-weighted loss term, which ensures that subtasks are more aligned with high-quality skills. 2. **Data Augmentation**: To improve the performance of offline RL agents, the model augmentes datasets with imaginary trajectories generated from high-quality skills. This augmentation helps in creating plausible trajectories and enhances the adaptability of the agents to tasks with limited or low-quality data. 3. **Experimental Evaluation**: The model is evaluated on various robotic manipulation and drone navigation tasks using the Meta-world environment and the Airsim drone simulator. Results show that the proposed model outperforms state-of-the-art algorithms, demonstrating robustness to mixed configurations of different-quality datasets. 4. **Ablation Study**: The effectiveness of the proposed techniques is demonstrated through ablation studies, showing that both skill regularization and data augmentation significantly improve performance. 5. **Conclusion**: The paper concludes by highlighting the potential for future work, including exploring the hierarchy of skill representation and improving the model's adaptability to different temporal abstraction levels. Overall, the paper provides a comprehensive solution for multi-task offline RL, effectively leveraging shared knowledge across tasks and robustly handling heterogeneous datasets.This paper presents a novel multi-task offline reinforcement learning (RL) model that addresses the challenge of heterogeneous datasets of varying quality. The model employs a skill-based approach to decompose tasks into achievable subtasks and uses quality-aware regularization to ensure robust performance. Key contributions include: 1. **Skill-regularized Task Decomposition**: The model jointly learns skill and task embeddings in a latent space, where common skills are used to guide the decomposition of tasks into subtasks. This decomposition is achieved through a Wasserstein auto-encoder (WAE) and a quality-weighted loss term, which ensures that subtasks are more aligned with high-quality skills. 2. **Data Augmentation**: To improve the performance of offline RL agents, the model augmentes datasets with imaginary trajectories generated from high-quality skills. This augmentation helps in creating plausible trajectories and enhances the adaptability of the agents to tasks with limited or low-quality data. 3. **Experimental Evaluation**: The model is evaluated on various robotic manipulation and drone navigation tasks using the Meta-world environment and the Airsim drone simulator. Results show that the proposed model outperforms state-of-the-art algorithms, demonstrating robustness to mixed configurations of different-quality datasets. 4. **Ablation Study**: The effectiveness of the proposed techniques is demonstrated through ablation studies, showing that both skill regularization and data augmentation significantly improve performance. 5. **Conclusion**: The paper concludes by highlighting the potential for future work, including exploring the hierarchy of skill representation and improving the model's adaptability to different temporal abstraction levels. Overall, the paper provides a comprehensive solution for multi-task offline RL, effectively leveraging shared knowledge across tasks and robustly handling heterogeneous datasets.

Skills Regularized Task Decomposition for Multi-task Offline Reinforcement Learning

28 Aug 2024 | Minjong Yoo, Sangwoo Cho, Honguk Woo*