Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

2 Jun 2019 | Zihang Dai*12, Zhilin Yang*12, Yiming Yang1, Jaime Carbonell1, Quoc V. Le2, Ruslan Salakhutdinov1
Transformer-XL is a novel neural architecture designed to enable learning dependencies beyond a fixed-length context in language modeling. It introduces segment-level recurrence and a novel positional encoding scheme to address the limitations of fixed-length contexts. The method not only captures longer-term dependencies but also resolves context fragmentation, achieving better performance on both short and long sequences. Transformer-XL outperforms RNNs and vanilla Transformers in terms of dependency length, achieving up to 1,800+ times faster evaluation speed. It improves state-of-the-art results on various benchmarks, including WikiText-103, enwik8, One Billion Word, and Penn Treebank. The code, pre-trained models, and hyperparameters are available in both Tensorflow and PyTorch.Transformer-XL is a novel neural architecture designed to enable learning dependencies beyond a fixed-length context in language modeling. It introduces segment-level recurrence and a novel positional encoding scheme to address the limitations of fixed-length contexts. The method not only captures longer-term dependencies but also resolves context fragmentation, achieving better performance on both short and long sequences. Transformer-XL outperforms RNNs and vanilla Transformers in terms of dependency length, achieving up to 1,800+ times faster evaluation speed. It improves state-of-the-art results on various benchmarks, including WikiText-103, enwik8, One Billion Word, and Penn Treebank. The code, pre-trained models, and hyperparameters are available in both Tensorflow and PyTorch.
Reach us at info@study.space
[slides] Transformer-XL%3A Attentive Language Models beyond a Fixed-Length Context | StudySpace