29 May 2024 | Artur Back de Luca*, Kimon Fountoulakis*
This paper presents a looped transformer architecture with extra attention heads that can simulate various graph algorithms, including Dijkstra's shortest path, Breadth-First Search (BFS), Depth-First Search (DFS), and Kosaraju's strongly connected components (SCC) algorithm. The architecture uses an adjacency matrix to encode the graph, allowing the transformer to access data without increasing the number of parameters with graph size. The transformer can simulate multiple algorithms simultaneously and is shown to be Turing Complete with constant width when extra attention heads are used. The simulation is achieved by looping through the transformer layers, with each iteration performing a step of the algorithm. The paper also discusses the limitations of the approach due to finite precision and shows that the architecture can handle graphs of varying sizes. The results are supported by theoretical analysis and empirical validation, demonstrating the effectiveness of the looped transformer in simulating graph algorithms. The architecture is efficient, with constant network width regardless of input graph size, and can handle graphs of varying dimensions. The paper also addresses the challenges of training the model to simulate these algorithms, highlighting the difficulties in approximating discontinuous functions with continuous ones. The results show that the looped transformer can simulate a wide range of graph algorithms and is a promising approach for algorithmic reasoning on graphs.This paper presents a looped transformer architecture with extra attention heads that can simulate various graph algorithms, including Dijkstra's shortest path, Breadth-First Search (BFS), Depth-First Search (DFS), and Kosaraju's strongly connected components (SCC) algorithm. The architecture uses an adjacency matrix to encode the graph, allowing the transformer to access data without increasing the number of parameters with graph size. The transformer can simulate multiple algorithms simultaneously and is shown to be Turing Complete with constant width when extra attention heads are used. The simulation is achieved by looping through the transformer layers, with each iteration performing a step of the algorithm. The paper also discusses the limitations of the approach due to finite precision and shows that the architecture can handle graphs of varying sizes. The results are supported by theoretical analysis and empirical validation, demonstrating the effectiveness of the looped transformer in simulating graph algorithms. The architecture is efficient, with constant network width regardless of input graph size, and can handle graphs of varying dimensions. The paper also addresses the challenges of training the model to simulate these algorithms, highlighting the difficulties in approximating discontinuous functions with continuous ones. The results show that the looped transformer can simulate a wide range of graph algorithms and is a promising approach for algorithmic reasoning on graphs.