MASS: Masked Sequence to Sequence Pre-training for Language Generation

MASS: Masked Sequence to Sequence Pre-training for Language Generation

2019 | Kaitao Song * 1 Xu Tan * 2 Tao Qin 2 Jianfeng Lu 1 Tie-Yan Liu 2
MASS is a pre-training method for language generation tasks, designed to jointly train encoder and decoder in a sequence-to-sequence framework. It reconstructs a sentence fragment given the remaining part of the sentence by masking a random fragment of the sentence. The encoder takes a sentence with a masked fragment as input, and the decoder tries to predict this masked fragment. This approach allows the model to learn representation extraction and language modeling capabilities. MASS achieves significant improvements over baselines in various language generation tasks, including neural machine translation, text summarization, and conversational response generation. It outperforms previous methods, achieving a state-of-the-art BLEU score of 37.5 on unsupervised English-French translation. MASS is pre-trained on monolingual data and fine-tuned on multiple language generation tasks, demonstrating its effectiveness in both zero- and low-resource settings. The method is applicable to a wide range of sequence generation tasks and is based on the sequence-to-sequence learning framework. MASS is designed to pre-train both encoder and decoder jointly, which is different from previous pre-training methods that focus on either the encoder or decoder. The method is effective in low-resource scenarios and has been tested on various datasets, including WMT and Gigaword. MASS achieves improvements on all these tasks, demonstrating its effectiveness and applicability. The contributions of this work include the proposal of MASS, its application to various language generation tasks, and the demonstration of its effectiveness.MASS is a pre-training method for language generation tasks, designed to jointly train encoder and decoder in a sequence-to-sequence framework. It reconstructs a sentence fragment given the remaining part of the sentence by masking a random fragment of the sentence. The encoder takes a sentence with a masked fragment as input, and the decoder tries to predict this masked fragment. This approach allows the model to learn representation extraction and language modeling capabilities. MASS achieves significant improvements over baselines in various language generation tasks, including neural machine translation, text summarization, and conversational response generation. It outperforms previous methods, achieving a state-of-the-art BLEU score of 37.5 on unsupervised English-French translation. MASS is pre-trained on monolingual data and fine-tuned on multiple language generation tasks, demonstrating its effectiveness in both zero- and low-resource settings. The method is applicable to a wide range of sequence generation tasks and is based on the sequence-to-sequence learning framework. MASS is designed to pre-train both encoder and decoder jointly, which is different from previous pre-training methods that focus on either the encoder or decoder. The method is effective in low-resource scenarios and has been tested on various datasets, including WMT and Gigaword. MASS achieves improvements on all these tasks, demonstrating its effectiveness and applicability. The contributions of this work include the proposal of MASS, its application to various language generation tasks, and the demonstration of its effectiveness.
Reach us at info@study.space