This paper introduces a continuous deep Q-learning algorithm with model-based acceleration for improving sample efficiency in continuous control tasks. The authors propose a variant of Q-learning called normalized advantage functions (NAF), which allows for effective Q-learning in continuous domains by decomposing the Q-function into a value function and an advantage function. This approach simplifies the algorithm and improves performance on simulated robotic control tasks. To further enhance efficiency, the authors explore the use of learned models for accelerating model-free reinforcement learning. They show that iteratively refitted local linear models are particularly effective for this purpose, leading to substantially faster learning in domains where such models are applicable.
The paper also presents an approach to incorporating learned models into continuous-action Q-learning through imagination rollouts. This method uses on-policy samples generated under the learned model, similar to the Dyna-Q algorithm. The authors demonstrate that this approach is highly effective when the learned dynamics model is accurate, but degrades with imperfect models. However, they show that iteratively fitting local linear models to the latest batch of on-policy or off-policy rollouts provides sufficient local accuracy to achieve substantial improvements using short imagination rollouts.
The authors evaluate their method on a series of simulated robotic tasks and compare it to prior methods. They find that NAF outperforms DDPG on most tasks, particularly manipulation tasks that require precision. The results show that NAF is more sample-efficient and achieves better performance in real-world robotic tasks. The paper also discusses the benefits and limitations of model-based learning, showing that while it can significantly improve sample efficiency, it may not always be the best approach for all tasks. The authors conclude that combining model-based and model-free learning can lead to more efficient and effective reinforcement learning in continuous control domains.This paper introduces a continuous deep Q-learning algorithm with model-based acceleration for improving sample efficiency in continuous control tasks. The authors propose a variant of Q-learning called normalized advantage functions (NAF), which allows for effective Q-learning in continuous domains by decomposing the Q-function into a value function and an advantage function. This approach simplifies the algorithm and improves performance on simulated robotic control tasks. To further enhance efficiency, the authors explore the use of learned models for accelerating model-free reinforcement learning. They show that iteratively refitted local linear models are particularly effective for this purpose, leading to substantially faster learning in domains where such models are applicable.
The paper also presents an approach to incorporating learned models into continuous-action Q-learning through imagination rollouts. This method uses on-policy samples generated under the learned model, similar to the Dyna-Q algorithm. The authors demonstrate that this approach is highly effective when the learned dynamics model is accurate, but degrades with imperfect models. However, they show that iteratively fitting local linear models to the latest batch of on-policy or off-policy rollouts provides sufficient local accuracy to achieve substantial improvements using short imagination rollouts.
The authors evaluate their method on a series of simulated robotic tasks and compare it to prior methods. They find that NAF outperforms DDPG on most tasks, particularly manipulation tasks that require precision. The results show that NAF is more sample-efficient and achieves better performance in real-world robotic tasks. The paper also discusses the benefits and limitations of model-based learning, showing that while it can significantly improve sample efficiency, it may not always be the best approach for all tasks. The authors conclude that combining model-based and model-free learning can lead to more efficient and effective reinforcement learning in continuous control domains.