A Deep Reinforced Model for Abstractive Summarization

A Deep Reinforced Model for Abstractive Summarization

13 Nov 2017 | Romain Paulus, Caiming Xiong & Richard Socher
This paper introduces a deep reinforced model for abstractive summarization that achieves state-of-the-art results on the CNN/Daily Mail and New York Times datasets. The model uses a novel intra-attention mechanism that attends separately to the input and continuously generated output, and a new training method combining standard supervised word prediction with reinforcement learning (RL). This approach reduces exposure bias and improves the readability of generated summaries. The model outperforms previous state-of-the-art models on the CNN/Daily Mail dataset, achieving a ROUGE-1 score of 41.16, and human evaluation shows higher quality summaries. The model also performs well on the New York Times dataset. The model uses a bidirectional LSTM encoder and a single LSTM decoder, with shared embeddings and intra-decoder attention to avoid repetition. The training objective combines maximum-likelihood loss with RL rewards to improve summary quality. The model is evaluated on both datasets, showing significant improvements in ROUGE scores and human readability. The results demonstrate that the combined training objective and intra-attention mechanism are effective for abstractive summarization, particularly for long documents. The model is also shown to be more suitable for long output sequences compared to extractive models. The paper also discusses related work in neural encoder-decoder models, reinforcement learning for sequence generation, and text summarization, highlighting the importance of using multiple metrics for evaluating summarization models.This paper introduces a deep reinforced model for abstractive summarization that achieves state-of-the-art results on the CNN/Daily Mail and New York Times datasets. The model uses a novel intra-attention mechanism that attends separately to the input and continuously generated output, and a new training method combining standard supervised word prediction with reinforcement learning (RL). This approach reduces exposure bias and improves the readability of generated summaries. The model outperforms previous state-of-the-art models on the CNN/Daily Mail dataset, achieving a ROUGE-1 score of 41.16, and human evaluation shows higher quality summaries. The model also performs well on the New York Times dataset. The model uses a bidirectional LSTM encoder and a single LSTM decoder, with shared embeddings and intra-decoder attention to avoid repetition. The training objective combines maximum-likelihood loss with RL rewards to improve summary quality. The model is evaluated on both datasets, showing significant improvements in ROUGE scores and human readability. The results demonstrate that the combined training objective and intra-attention mechanism are effective for abstractive summarization, particularly for long documents. The model is also shown to be more suitable for long output sequences compared to extractive models. The paper also discusses related work in neural encoder-decoder models, reinforcement learning for sequence generation, and text summarization, highlighting the importance of using multiple metrics for evaluating summarization models.
Reach us at info@study.space
[slides] A Deep Reinforced Model for Abstractive Summarization | StudySpace