29 Oct 2019 | Mike Lewis*, Yinhan Liu*, Naman Goyal*, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer
BART is a denoising autoencoder pre-trained for sequence-to-sequence models, designed to improve performance on a wide range of natural language processing tasks. It uses a Transformer-based architecture, combining a bidirectional encoder and an autoregressive decoder. BART is trained by corrupting text with various noising functions and then learning to reconstruct the original text. The best performance is achieved through random shuffling of sentences and a novel in-filling scheme, where spans of text are replaced with a single mask token. BART performs well on text generation tasks and comprehension tasks, matching the performance of RoBERTa on GLUE and SQuAD while achieving state-of-the-art results on abstractive dialogue, question answering, and summarization tasks. It also improves machine translation performance by 1.1 BLEU points over a back-translation system. Ablation experiments show that BART consistently outperforms other pre-training objectives.BART is a denoising autoencoder pre-trained for sequence-to-sequence models, designed to improve performance on a wide range of natural language processing tasks. It uses a Transformer-based architecture, combining a bidirectional encoder and an autoregressive decoder. BART is trained by corrupting text with various noising functions and then learning to reconstruct the original text. The best performance is achieved through random shuffling of sentences and a novel in-filling scheme, where spans of text are replaced with a single mask token. BART performs well on text generation tasks and comprehension tasks, matching the performance of RoBERTa on GLUE and SQuAD while achieving state-of-the-art results on abstractive dialogue, question answering, and summarization tasks. It also improves machine translation performance by 1.1 BLEU points over a back-translation system. Ablation experiments show that BART consistently outperforms other pre-training objectives.