MPNet: Masked and Permuted Pre-training for Language Understanding

MPNet: Masked and Permuted Pre-training for Language Understanding

2 Nov 2020 | Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu
MPNet is a novel pre-training method that improves upon both BERT and XLNet. BERT uses masked language modeling (MLM) but ignores dependencies between predicted tokens, while XLNet uses permuted language modeling (PLM) but lacks full position information. MPNet combines the strengths of both, using permuted language modeling to capture token dependencies and incorporating full position information to reduce discrepancies between pre-training and fine-tuning. MPNet is pre-trained on a large-scale dataset (over 160GB text) and fine-tuned on various downstream tasks, including GLUE, SQuAD, RACE, and IMDB. Experimental results show that MPNet outperforms MLM, PLM, BERT, XLNet, and RoBERTa, achieving significant improvements on GLUE, SQuAD, RACE, and IMDB tasks. MPNet also demonstrates better performance in tasks requiring full sentence position information. The method uses two-stream self-attention and position compensation to ensure the model sees the full sentence during pre-training. MPNet's effectiveness is validated through extensive experiments, showing its potential for language understanding. The code and pre-trained models are available at https://github.com/microsoft/MPNet.MPNet is a novel pre-training method that improves upon both BERT and XLNet. BERT uses masked language modeling (MLM) but ignores dependencies between predicted tokens, while XLNet uses permuted language modeling (PLM) but lacks full position information. MPNet combines the strengths of both, using permuted language modeling to capture token dependencies and incorporating full position information to reduce discrepancies between pre-training and fine-tuning. MPNet is pre-trained on a large-scale dataset (over 160GB text) and fine-tuned on various downstream tasks, including GLUE, SQuAD, RACE, and IMDB. Experimental results show that MPNet outperforms MLM, PLM, BERT, XLNet, and RoBERTa, achieving significant improvements on GLUE, SQuAD, RACE, and IMDB tasks. MPNet also demonstrates better performance in tasks requiring full sentence position information. The method uses two-stream self-attention and position compensation to ensure the model sees the full sentence during pre-training. MPNet's effectiveness is validated through extensive experiments, showing its potential for language understanding. The code and pre-trained models are available at https://github.com/microsoft/MPNet.
Reach us at info@study.space
Understanding MPNet%3A Masked and Permuted Pre-training for Language Understanding