[slides and audio] XLNet%3A Generalized Autoregressive Pretraining for Language Understanding

XLNet is a generalized autoregressive pretraining method designed to address the limitations of BERT, which relies on corrupting input sequences with masks and struggles with bidirectional context modeling. XLNet achieves this by maximizing the expected likelihood over all permutations of the factorization order, enabling the model to learn bidirectional contexts. It also integrates ideas from Transformer-XL, such as segment recurrence and relative encoding, to improve performance on tasks involving long text sequences. Empirically, XLNet outperforms BERT on a wide range of tasks, including question answering, natural language inference, sentiment analysis, and document ranking, often by a significant margin. The key contributions of XLNet include its novel pretraining objective, architectural improvements, and the integration of advanced techniques from Transformer-XL.XLNet is a generalized autoregressive pretraining method designed to address the limitations of BERT, which relies on corrupting input sequences with masks and struggles with bidirectional context modeling. XLNet achieves this by maximizing the expected likelihood over all permutations of the factorization order, enabling the model to learn bidirectional contexts. It also integrates ideas from Transformer-XL, such as segment recurrence and relative encoding, to improve performance on tasks involving long text sequences. Empirically, XLNet outperforms BERT on a wide range of tasks, including question answering, natural language inference, sentiment analysis, and document ranking, often by a significant margin. The key contributions of XLNet include its novel pretraining objective, architectural improvements, and the integration of advanced techniques from Transformer-XL.

XLNet: Generalized Autoregressive Pretraining for Language Understanding

2 Jan 2020 | Zhilin Yang1, Zihang Dai12, Yiming Yang1, Jaime Carbonell1, Ruslan Salakhutdinov1, Quoc V. Le2

XLNet: Generalized Autoregressive Pretraining for Language Understanding

2 Jan 2020 | Zhilin Yang*1, Zihang Dai*12, Yiming Yang1, Jaime Carbonell1, Ruslan Salakhutdinov1, Quoc V. Le2

2 Jan 2020 | Zhilin Yang1, Zihang Dai12, Yiming Yang1, Jaime Carbonell1, Ruslan Salakhutdinov1, Quoc V. Le2