Simulation of Graph Algorithms with Looped Transformers

Simulation of Graph Algorithms with Looped Transformers

29 May 2024 | Artur Back de Luca, Kimon Fountoulakis
This paper explores the use of looped transformers to simulate graph algorithms, focusing on theoretical aspects. The authors demonstrate that a looped transformer architecture with extra attention heads can simulate various graph algorithms, including Dijkstra's shortest path, Breadth-First Search (BFS), Depth-First Search (DFS), and Kosaraju's strongly connected components algorithm. The key innovation is the ability to simulate these algorithms without the network's parameter count scaling with the graph size, making it scalable for graphs of varying sizes. However, finite precision limits the simulation capabilities. The paper also presents a Turing Completeness result for the looped transformer with constant width when additional attention heads are used. The authors provide detailed constructions and proofs for each algorithm, showing how the network can simulate each step of the algorithms. They discuss the limitations of training such networks due to the need to approximate discontinuous functions with continuous ones, leading to ill-conditioning issues. Despite these challenges, the paper highlights the potential of looped transformers for algorithmic reasoning on graphs.This paper explores the use of looped transformers to simulate graph algorithms, focusing on theoretical aspects. The authors demonstrate that a looped transformer architecture with extra attention heads can simulate various graph algorithms, including Dijkstra's shortest path, Breadth-First Search (BFS), Depth-First Search (DFS), and Kosaraju's strongly connected components algorithm. The key innovation is the ability to simulate these algorithms without the network's parameter count scaling with the graph size, making it scalable for graphs of varying sizes. However, finite precision limits the simulation capabilities. The paper also presents a Turing Completeness result for the looped transformer with constant width when additional attention heads are used. The authors provide detailed constructions and proofs for each algorithm, showing how the network can simulate each step of the algorithms. They discuss the limitations of training such networks due to the need to approximate discontinuous functions with continuous ones, leading to ill-conditioning issues. Despite these challenges, the paper highlights the potential of looped transformers for algorithmic reasoning on graphs.
Reach us at info@study.space