Understanding SeqGAN%3A Sequence Generative Adversarial Nets with Policy Gradient

SeqGAN is a novel framework for sequence generation that addresses the limitations of traditional Generative Adversarial Networks (GANs) in generating sequences of discrete tokens. GANs struggle with discrete outputs and cannot pass gradient updates from the discriminative model to the generative model effectively. SeqGAN models the data generator as a stochastic policy in reinforcement learning (RL), bypassing the generator differentiation problem by directly performing gradient policy updates. The RL reward signal comes from the GAN discriminator evaluated on a complete sequence, and is passed back to intermediate state-action steps using Monte Carlo search. Extensive experiments on synthetic data and real-world tasks demonstrate significant improvements over strong baselines, including maximum likelihood methods, scheduled sampling, and policy gradient with BLEU. SeqGAN shows superior performance in generating creative sequences, such as poems, speech language, and music, outperforming other methods in various metrics, including human expert judgment. The paper also discusses the stability and robustness of SeqGAN, highlighting the importance of training strategies and the effectiveness of the proposed approach in handling large-scale data and long-term planning.SeqGAN is a novel framework for sequence generation that addresses the limitations of traditional Generative Adversarial Networks (GANs) in generating sequences of discrete tokens. GANs struggle with discrete outputs and cannot pass gradient updates from the discriminative model to the generative model effectively. SeqGAN models the data generator as a stochastic policy in reinforcement learning (RL), bypassing the generator differentiation problem by directly performing gradient policy updates. The RL reward signal comes from the GAN discriminator evaluated on a complete sequence, and is passed back to intermediate state-action steps using Monte Carlo search. Extensive experiments on synthetic data and real-world tasks demonstrate significant improvements over strong baselines, including maximum likelihood methods, scheduled sampling, and policy gradient with BLEU. SeqGAN shows superior performance in generating creative sequences, such as poems, speech language, and music, outperforming other methods in various metrics, including human expert judgment. The paper also discusses the stability and robustness of SeqGAN, highlighting the importance of training strategies and the effectiveness of the proposed approach in handling large-scale data and long-term planning.

SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient

25 Aug 2017 | Lantao Yu†, Weinan Zhang†‡, Jun Wang†, Yong Yu†