Linfomer: Self-Attention with Linear Complexity

Linfomer: Self-Attention with Linear Complexity

14 Jun 2020 | Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Fang, Hao Ma
Linformer: Self-Attention with Linear Complexity Linformer is a novel self-attention mechanism that reduces the computational complexity of the Transformer model from O(n²) to O(n) in both time and space. This is achieved by approximating the self-attention mechanism using a low-rank matrix. The Linformer model performs as well as standard Transformer models but is significantly more memory and time-efficient. The main efficiency bottleneck in Transformer models is the self-attention mechanism, which has a quadratic complexity with respect to sequence length. Linformer addresses this by leveraging the low-rank property of the self-attention matrix. This allows the model to reduce the self-attention complexity to O(n) in both time and space. The Linformer model is tested on various tasks, including pretraining on BookCorpus and English Wikipedia, and fine-tuning on tasks such as IMDB reviews and GLUE benchmark tasks. The results show that the Linformer model performs comparably or slightly better than the standard Transformer model, while achieving significant training and inference speedups. The Linformer model also incorporates additional efficiency techniques, such as parameter sharing between projections and nonuniform projected dimensions, which further optimize performance and efficiency. These techniques allow the model to reduce the number of parameters and memory consumption without significant performance degradation. Experiments show that the Linformer model has significantly faster inference times and allows for larger batch sizes compared to the standard Transformer model. The model's performance is mainly determined by the projected dimension k rather than the ratio n/k. The Linformer model has broad implications for making Transformers more efficient, enabling their deployment on devices and improving training efficiency. It also has potential benefits for training transformers on images and reducing power consumption, leading to positive environmental impacts. The work has no immediate negative ethical or societal impacts beyond those applicable to other core building blocks of deep learning.Linformer: Self-Attention with Linear Complexity Linformer is a novel self-attention mechanism that reduces the computational complexity of the Transformer model from O(n²) to O(n) in both time and space. This is achieved by approximating the self-attention mechanism using a low-rank matrix. The Linformer model performs as well as standard Transformer models but is significantly more memory and time-efficient. The main efficiency bottleneck in Transformer models is the self-attention mechanism, which has a quadratic complexity with respect to sequence length. Linformer addresses this by leveraging the low-rank property of the self-attention matrix. This allows the model to reduce the self-attention complexity to O(n) in both time and space. The Linformer model is tested on various tasks, including pretraining on BookCorpus and English Wikipedia, and fine-tuning on tasks such as IMDB reviews and GLUE benchmark tasks. The results show that the Linformer model performs comparably or slightly better than the standard Transformer model, while achieving significant training and inference speedups. The Linformer model also incorporates additional efficiency techniques, such as parameter sharing between projections and nonuniform projected dimensions, which further optimize performance and efficiency. These techniques allow the model to reduce the number of parameters and memory consumption without significant performance degradation. Experiments show that the Linformer model has significantly faster inference times and allows for larger batch sizes compared to the standard Transformer model. The model's performance is mainly determined by the projected dimension k rather than the ratio n/k. The Linformer model has broad implications for making Transformers more efficient, enabling their deployment on devices and improving training efficiency. It also has potential benefits for training transformers on images and reducing power consumption, leading to positive environmental impacts. The work has no immediate negative ethical or societal impacts beyond those applicable to other core building blocks of deep learning.
Reach us at info@study.space
[slides] Linformer%3A Self-Attention with Linear Complexity | StudySpace