On Limitations of the Transformer Architecture

On Limitations of the Transformer Architecture

February 28, 2024 | Binghui Peng, Srini Narayanan, Christos Papadimitriou
The paper "On Limitations of the Transformer Architecture" by Binghui Peng, Srini Narayanan, and Christos Papadimitriou explores the root causes of hallucinations in large language models (LLMs), particularly focusing on the Transformer architecture. The authors use Communication Complexity to prove that the Transformer layer is incapable of composing functions, such as identifying a grandparent in a genealogy, if the domains of the functions are large enough. They demonstrate that this inability is already empirically present even for small domains. The paper also highlights that several mathematical tasks, which are considered hard for LLMs, are unlikely to be solvable by Transformers for large instances, assuming certain well-accepted conjectures in Computational Complexity. These tasks include function composition, iterated function composition, and compositional tasks like multiplication of multi-digit integers and solving logical puzzles. The authors provide a detailed proof that a single Transformer attention layer cannot compute the answer to a function composition query correctly with significant probability of success, provided that the size of the domain of the function satisfies a certain condition. They also show that multi-layer Transformers are incapable of performing several elementary computations crucial for carrying out compositional tasks. Additionally, the paper discusses the role of Chain of Thought (CoT) in mitigating hallucinations. While CoT can help with function composition, the authors prove that an arbitrarily large number of CoT steps are needed to solve the generalization of composition to many function applications. The paper concludes with a discussion on the limitations of the Transformer architecture, emphasizing the need for more sophisticated attention layers to address these limitations while maintaining efficiency and effectiveness.The paper "On Limitations of the Transformer Architecture" by Binghui Peng, Srini Narayanan, and Christos Papadimitriou explores the root causes of hallucinations in large language models (LLMs), particularly focusing on the Transformer architecture. The authors use Communication Complexity to prove that the Transformer layer is incapable of composing functions, such as identifying a grandparent in a genealogy, if the domains of the functions are large enough. They demonstrate that this inability is already empirically present even for small domains. The paper also highlights that several mathematical tasks, which are considered hard for LLMs, are unlikely to be solvable by Transformers for large instances, assuming certain well-accepted conjectures in Computational Complexity. These tasks include function composition, iterated function composition, and compositional tasks like multiplication of multi-digit integers and solving logical puzzles. The authors provide a detailed proof that a single Transformer attention layer cannot compute the answer to a function composition query correctly with significant probability of success, provided that the size of the domain of the function satisfies a certain condition. They also show that multi-layer Transformers are incapable of performing several elementary computations crucial for carrying out compositional tasks. Additionally, the paper discusses the role of Chain of Thought (CoT) in mitigating hallucinations. While CoT can help with function composition, the authors prove that an arbitrarily large number of CoT steps are needed to solve the generalization of composition to many function applications. The paper concludes with a discussion on the limitations of the Transformer architecture, emphasizing the need for more sophisticated attention layers to address these limitations while maintaining efficiency and effectiveness.
Reach us at info@study.space
[slides and audio] On Limitations of the Transformer Architecture