This paper presents a novel approach to text summarization using the BERT model. The authors propose a document-level encoder based on BERT that can capture the semantics of a document and generate sentence-level representations. They introduce two models: an extractive model that uses stacked Transformer layers to extract sentences, and an abstractive model that employs an encoder-decoder architecture with a randomly initialized decoder. The abstractive model uses a new fine-tuning schedule that separates the optimizers for the encoder and decoder to address the mismatch between the two. They also propose a two-stage fine-tuning approach that first trains the encoder on an extractive task and then on an abstractive task. Experiments on three datasets show that their models achieve state-of-the-art results in both extractive and abstractive settings. The authors highlight the importance of document encoding for summarization and demonstrate how pretrained language models can be effectively used in both extractive and abstractive settings. They also show that their models can serve as a foundation for further improvements in summarization performance. The paper also includes results from human evaluations, which confirm the effectiveness of their approach.This paper presents a novel approach to text summarization using the BERT model. The authors propose a document-level encoder based on BERT that can capture the semantics of a document and generate sentence-level representations. They introduce two models: an extractive model that uses stacked Transformer layers to extract sentences, and an abstractive model that employs an encoder-decoder architecture with a randomly initialized decoder. The abstractive model uses a new fine-tuning schedule that separates the optimizers for the encoder and decoder to address the mismatch between the two. They also propose a two-stage fine-tuning approach that first trains the encoder on an extractive task and then on an abstractive task. Experiments on three datasets show that their models achieve state-of-the-art results in both extractive and abstractive settings. The authors highlight the importance of document encoding for summarization and demonstrate how pretrained language models can be effectively used in both extractive and abstractive settings. They also show that their models can serve as a foundation for further improvements in summarization performance. The paper also includes results from human evaluations, which confirm the effectiveness of their approach.