Understanding Generative Adversarial Imitation Learning

The paper introduces a new framework for directly extracting a policy from expert demonstrations without the need for interaction with the expert or reinforcement signals. The authors propose a model-free imitation learning algorithm that draws an analogy between imitation learning and generative adversarial networks (GANs). This algorithm is designed to be efficient in large, high-dimensional environments and outperforms existing methods in complex, physics-based control tasks. The key idea is to use a cost regularizer that penalizes violations in the difference between the learner's occupancy measure and the expert's, allowing the learner to match the expert's behavior closely. The algorithm is evaluated on various physics-based control tasks, demonstrating superior performance compared to behavioral cloning, feature expectation matching, and game-theoretic apprenticeship learning. The authors also discuss the limitations of their approach and suggest future directions for improving sample efficiency and combining it with expert interaction.The paper introduces a new framework for directly extracting a policy from expert demonstrations without the need for interaction with the expert or reinforcement signals. The authors propose a model-free imitation learning algorithm that draws an analogy between imitation learning and generative adversarial networks (GANs). This algorithm is designed to be efficient in large, high-dimensional environments and outperforms existing methods in complex, physics-based control tasks. The key idea is to use a cost regularizer that penalizes violations in the difference between the learner's occupancy measure and the expert's, allowing the learner to match the expert's behavior closely. The algorithm is evaluated on various physics-based control tasks, demonstrating superior performance compared to behavioral cloning, feature expectation matching, and game-theoretic apprenticeship learning. The authors also discuss the limitations of their approach and suggest future directions for improving sample efficiency and combining it with expert interaction.

Generative Adversarial Imitation Learning

10 Jun 2016 | Jonathan Ho, Stefano Ermon