Understanding Text Summarization with Pretrained Encoders

This paper explores the application of Bidirectional Encoder Representations from Transformers (BERT) in text summarization, both extractive and abstractive. The authors propose a novel document-level encoder based on BERT, which can encode a document and obtain representations for its sentences. For extractive summarization, an encoder-decoder architecture is used, with the encoder being a stacked Transformer layer on top of the BERT encoder. For abstractive summarization, a new fine-tuning schedule is introduced, using different optimizers for the encoder and decoder to alleviate the mismatch between the pretrained encoder and the randomly initialized decoder. A two-stage fine-tuning approach is also proposed, where the encoder is first fine-tuned on the extractive task and then on the abstractive task. Experiments on three datasets (CNN/DailyMail, New York Times Annotated Corpus, and XSum) show that the proposed models achieve state-of-the-art results in both extractive and abstractive summarization tasks. The contributions of the work include highlighting the importance of document encoding, demonstrating effective use of pretrained language models in summarization, and providing a stepping stone for further improvements in summarization performance.This paper explores the application of Bidirectional Encoder Representations from Transformers (BERT) in text summarization, both extractive and abstractive. The authors propose a novel document-level encoder based on BERT, which can encode a document and obtain representations for its sentences. For extractive summarization, an encoder-decoder architecture is used, with the encoder being a stacked Transformer layer on top of the BERT encoder. For abstractive summarization, a new fine-tuning schedule is introduced, using different optimizers for the encoder and decoder to alleviate the mismatch between the pretrained encoder and the randomly initialized decoder. A two-stage fine-tuning approach is also proposed, where the encoder is first fine-tuned on the extractive task and then on the abstractive task. Experiments on three datasets (CNN/DailyMail, New York Times Annotated Corpus, and XSum) show that the proposed models achieve state-of-the-art results in both extractive and abstractive summarization tasks. The contributions of the work include highlighting the importance of document encoding, demonstrating effective use of pretrained language models in summarization, and providing a stepping stone for further improvements in summarization performance.

Text Summarization with Pretrained Encoders

5 Sep 2019 | Yang Liu and Mirella Lapata