[slides and audio] Quantum linear algebra is all you need for Transformer architectures

The paper explores the application of quantum computing to transformer architectures, a key component in large language models (LLMs). The authors investigate how quantum algorithms can be used to perform the linear algebra operations involved in the transformer, particularly focusing on fault-tolerant quantum computing. They construct quantum subroutines for essential building blocks such as self-attention, residual connections, layer normalization, and feed-forward neural networks. The main contributions include: 1. **Element-wise Function of Block-Encoded Matrices**: They develop a subroutine to implement element-wise functions on block-encodings, which is crucial for the softmax function in the self-attention block. This subroutine uses polynomial approximations and Hadamard products of block-encodings. 2. **Conversion Between State Preparation Encoding and Matrix Block Encoding**: They provide a method to convert between state preparation encoding and matrix block encoding, which is essential for coherent implementation of complex transformer architectures on quantum computers. 3. **Quantum Self-Attention**: They describe how to achieve the quantum self-attention block, which involves implementing the softmax function and other necessary procedures using the element-wise function method and amplitude encoding. 4. **Quantum Residual Connection and Layer Normalization**: They outline the quantum analogs of these classical operations, ensuring that the quantum circuit can handle the same input assumptions as the classical version. 5. **Quantum Feed-Forward Network**: They present a quantum circuit for the feed-forward network, which is a two-layer fully-connected network. The paper also discusses the potential quantum advantages and challenges, including the computational resources required for inference and the limitations of current quantum hardware. The authors provide numerical experiments to verify their theoretical results and discuss the generalization to multi-layer architectures. Overall, the work aims to demonstrate the feasibility of using quantum computing to enhance the efficiency of large language models.The paper explores the application of quantum computing to transformer architectures, a key component in large language models (LLMs). The authors investigate how quantum algorithms can be used to perform the linear algebra operations involved in the transformer, particularly focusing on fault-tolerant quantum computing. They construct quantum subroutines for essential building blocks such as self-attention, residual connections, layer normalization, and feed-forward neural networks. The main contributions include: 1. **Element-wise Function of Block-Encoded Matrices**: They develop a subroutine to implement element-wise functions on block-encodings, which is crucial for the softmax function in the self-attention block. This subroutine uses polynomial approximations and Hadamard products of block-encodings. 2. **Conversion Between State Preparation Encoding and Matrix Block Encoding**: They provide a method to convert between state preparation encoding and matrix block encoding, which is essential for coherent implementation of complex transformer architectures on quantum computers. 3. **Quantum Self-Attention**: They describe how to achieve the quantum self-attention block, which involves implementing the softmax function and other necessary procedures using the element-wise function method and amplitude encoding. 4. **Quantum Residual Connection and Layer Normalization**: They outline the quantum analogs of these classical operations, ensuring that the quantum circuit can handle the same input assumptions as the classical version. 5. **Quantum Feed-Forward Network**: They present a quantum circuit for the feed-forward network, which is a two-layer fully-connected network. The paper also discusses the potential quantum advantages and challenges, including the computational resources required for inference and the limitations of current quantum hardware. The authors provide numerical experiments to verify their theoretical results and discuss the generalization to multi-layer architectures. Overall, the work aims to demonstrate the feasibility of using quantum computing to enhance the efficiency of large language models.

Quantum linear algebra is all you need for Transformer architectures

June 3, 2024 | Naixu Guo, Zhan Yu, Matthew Choi, Aman Agrawal, Kouhei Nakaji, Alán Aspuru-Guzik, Patrick Rebentrost