10 Jul 2020 | Jingqing Zhang * 1 Yao Zhao * 2 Mohammad Saleh 2 Peter J. Liu 2
PEGASUS is a pre-training method for abstractive text summarization that uses gap-sentence generation as a self-supervised objective. The method involves removing or masking important sentences from a document and generating them from the remaining sentences, similar to extractive summarization. The PEGASUS model was pre-trained on large text corpora, including C4 and HugeNews, and achieved state-of-the-art performance on 12 downstream summarization tasks, including news, science, stories, instructions, emails, patents, and legislative bills. The model also showed strong performance on low-resource summarization tasks, surpassing previous state-of-the-art results on six datasets with only 1000 examples. Human evaluation studies confirmed that PEGASUS summaries achieved human-level performance on multiple datasets. The model was trained using a combination of gap-sentence generation and masked language modeling, but the latter was not included in the final model due to its limited effectiveness. The PEGASUS model was evaluated on various datasets, including XSum, CNN/DailyMail, WikiHow, and Reddit TIFU, and showed significant improvements in performance compared to previous models. The model was also tested on low-resource summarization tasks, demonstrating its ability to adapt quickly to new datasets with minimal supervision. Overall, PEGASUS achieved strong results across a wide range of summarization tasks and demonstrated the effectiveness of gap-sentence generation as a pre-training objective for abstractive summarization.PEGASUS is a pre-training method for abstractive text summarization that uses gap-sentence generation as a self-supervised objective. The method involves removing or masking important sentences from a document and generating them from the remaining sentences, similar to extractive summarization. The PEGASUS model was pre-trained on large text corpora, including C4 and HugeNews, and achieved state-of-the-art performance on 12 downstream summarization tasks, including news, science, stories, instructions, emails, patents, and legislative bills. The model also showed strong performance on low-resource summarization tasks, surpassing previous state-of-the-art results on six datasets with only 1000 examples. Human evaluation studies confirmed that PEGASUS summaries achieved human-level performance on multiple datasets. The model was trained using a combination of gap-sentence generation and masked language modeling, but the latter was not included in the final model due to its limited effectiveness. The PEGASUS model was evaluated on various datasets, including XSum, CNN/DailyMail, WikiHow, and Reddit TIFU, and showed significant improvements in performance compared to previous models. The model was also tested on low-resource summarization tasks, demonstrating its ability to adapt quickly to new datasets with minimal supervision. Overall, PEGASUS achieved strong results across a wide range of summarization tasks and demonstrated the effectiveness of gap-sentence generation as a pre-training objective for abstractive summarization.