2 Nov 2020 | Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu
MPNet is a novel pre-training method for language understanding that combines the strengths of BERT's masked language modeling (MLM) and XLNet's permuted language modeling (PLM) while addressing their limitations. Unlike BERT, which neglects dependencies among predicted tokens, and XLNet, which lacks full position information, MPNet leverages both dependency and position information. It pre-trains on a large-scale dataset (over 160GB) and is fine-tuned on various downstream tasks (GLUE, SQuAD, etc.). Experimental results show that MPNet outperforms both MLM and PLM by a significant margin, achieving better results than state-of-the-art models like BERT, XLNet, and RoBERTa. The method introduces position compensation to ensure the model sees the full sentence during pre-training, reducing the discrepancy between pre-training and fine-tuning. MPNet's effectiveness is demonstrated through ablation studies and comparisons on multiple benchmarks, highlighting its advantages in language understanding tasks.MPNet is a novel pre-training method for language understanding that combines the strengths of BERT's masked language modeling (MLM) and XLNet's permuted language modeling (PLM) while addressing their limitations. Unlike BERT, which neglects dependencies among predicted tokens, and XLNet, which lacks full position information, MPNet leverages both dependency and position information. It pre-trains on a large-scale dataset (over 160GB) and is fine-tuned on various downstream tasks (GLUE, SQuAD, etc.). Experimental results show that MPNet outperforms both MLM and PLM by a significant margin, achieving better results than state-of-the-art models like BERT, XLNet, and RoBERTa. The method introduces position compensation to ensure the model sees the full sentence during pre-training, reducing the discrepancy between pre-training and fine-tuning. MPNet's effectiveness is demonstrated through ablation studies and comparisons on multiple benchmarks, highlighting its advantages in language understanding tasks.