2019 | Kaitao Song * 1 Xu Tan * 2 Tao Qin 2 Jianfeng Lu 1 Tie-Yan Liu 2
MASS (Masked Sequence to Sequence Pre-training) is a novel pre-training method designed for encoder-decoder based language generation tasks. Inspired by BERT, MASS reconstructs a sentence fragment given the remaining part of the sentence. The encoder takes a sentence with randomly masked fragments as input, and the decoder predicts these masked fragments. This approach jointly trains the encoder and decoder to develop representation extraction and language modeling capabilities. MASS is pre-trained on a monolingual corpus and then fine-tuned on various zero/low-resource language generation tasks, including neural machine translation, text summarization, and conversational response generation. Experiments on multiple datasets show that MASS achieves significant improvements over baselines, achieving state-of-the-art accuracy in unsupervised English-French translation. The method's effectiveness is demonstrated through its ability to handle both zero- and low-resource settings, making it a versatile solution for a wide range of sequence generation tasks.MASS (Masked Sequence to Sequence Pre-training) is a novel pre-training method designed for encoder-decoder based language generation tasks. Inspired by BERT, MASS reconstructs a sentence fragment given the remaining part of the sentence. The encoder takes a sentence with randomly masked fragments as input, and the decoder predicts these masked fragments. This approach jointly trains the encoder and decoder to develop representation extraction and language modeling capabilities. MASS is pre-trained on a monolingual corpus and then fine-tuned on various zero/low-resource language generation tasks, including neural machine translation, text summarization, and conversational response generation. Experiments on multiple datasets show that MASS achieves significant improvements over baselines, achieving state-of-the-art accuracy in unsupervised English-French translation. The method's effectiveness is demonstrated through its ability to handle both zero- and low-resource settings, making it a versatile solution for a wide range of sequence generation tasks.