[slides] Polynormer%3A Polynomial-Expressive Graph Transformer in Linear Time

**Polynommer: Polynomial-Expressive Graph Transformer in Linear Time** **Chenhui Deng, Zichao Yue, Zhiru Zhang** Cornell University, Ithaca, USA {cd574, zy383, zhiruz}@cornell.edu **Abstract** Graph transformers (GTs) have emerged as a promising architecture that is theoretically more expressive than message-passing graph neural networks (GNNs). However, typical GT models have at least quadratic complexity and thus cannot scale to large graphs. While there are several linear GTs recently proposed, they still lag behind GNN counterparts on several popular graph datasets, which poses a critical concern on their practical expressivity. To balance the trade-off between expressivity and scalability of GTs, we propose Polynommer, a polynomial-expressive GT model with linear complexity. Polynommer is built upon a novel base model that learns a high-degree polynomial on input features. To enable the base model permutation equivariant, we integrate it with graph topology and node features separately, resulting in local and global equivariant attention models. Consequently, Polynommer adopts a linear local-to-global attention scheme to learn high-degree equivariant polynomials whose coefficients are controlled by attention scores. Polynommer has been evaluated on 13 homophilic and heterophilic datasets, including large graphs with millions of nodes. Our extensive experiment results show that Polynommer outperforms state-of-the-art GNN and GT baselines on most datasets, even without the use of nonlinear activation functions. Source code of Polynommer is freely available at: github.com/cornell-zhang/Polynommer. **Introduction** Conventional graph neural networks (GNNs) suffer from over-smoothing and over-squashing issues, leading to limited expressive power. Graph transformers (GTs) have become increasingly popular due to their ability to attend to all nodes in a graph, inherently overcoming these limitations. However, the number of GT layers is typically restricted to a small constant, limiting their expressivity. Prior work has attempted to enhance GT expressivity by incorporating inductive biases through positional encoding (PE) and structural encoding (SE). However, these methods generally involve nontrivial overheads and adopt the self-attention module in the vanilla Transformer model, which has quadratic complexity with respect to the number of nodes, prohibiting their applications in large-scale node classification tasks. To address the scalability challenge, many linear GTs have been proposed, but they lack a thorough analysis of their expressivity in practice and may perform worse than state-of-the-art GNNs. In this work, we propose Polynommer, a linear GT model that is polynomial-expressive. An $L$-layer Polynommer can expressively represent a polynomial of degree $2^L$, which maps input node features to output node representations and is equivariant to node permutations. The polynomial expressivity is motivated by the Weierstr**Polynommer: Polynomial-Expressive Graph Transformer in Linear Time** **Chenhui Deng, Zichao Yue, Zhiru Zhang** Cornell University, Ithaca, USA {cd574, zy383, zhiruz}@cornell.edu **Abstract** Graph transformers (GTs) have emerged as a promising architecture that is theoretically more expressive than message-passing graph neural networks (GNNs). However, typical GT models have at least quadratic complexity and thus cannot scale to large graphs. While there are several linear GTs recently proposed, they still lag behind GNN counterparts on several popular graph datasets, which poses a critical concern on their practical expressivity. To balance the trade-off between expressivity and scalability of GTs, we propose Polynommer, a polynomial-expressive GT model with linear complexity. Polynommer is built upon a novel base model that learns a high-degree polynomial on input features. To enable the base model permutation equivariant, we integrate it with graph topology and node features separately, resulting in local and global equivariant attention models. Consequently, Polynommer adopts a linear local-to-global attention scheme to learn high-degree equivariant polynomials whose coefficients are controlled by attention scores. Polynommer has been evaluated on 13 homophilic and heterophilic datasets, including large graphs with millions of nodes. Our extensive experiment results show that Polynommer outperforms state-of-the-art GNN and GT baselines on most datasets, even without the use of nonlinear activation functions. Source code of Polynommer is freely available at: github.com/cornell-zhang/Polynommer. **Introduction** Conventional graph neural networks (GNNs) suffer from over-smoothing and over-squashing issues, leading to limited expressive power. Graph transformers (GTs) have become increasingly popular due to their ability to attend to all nodes in a graph, inherently overcoming these limitations. However, the number of GT layers is typically restricted to a small constant, limiting their expressivity. Prior work has attempted to enhance GT expressivity by incorporating inductive biases through positional encoding (PE) and structural encoding (SE). However, these methods generally involve nontrivial overheads and adopt the self-attention module in the vanilla Transformer model, which has quadratic complexity with respect to the number of nodes, prohibiting their applications in large-scale node classification tasks. To address the scalability challenge, many linear GTs have been proposed, but they lack a thorough analysis of their expressivity in practice and may perform worse than state-of-the-art GNNs. In this work, we propose Polynommer, a linear GT model that is polynomial-expressive. An $L$-layer Polynommer can expressively represent a polynomial of degree $2^L$, which maps input node features to output node representations and is equivariant to node permutations. The polynomial expressivity is motivated by the Weierstr

Polynormer: Polynomial-Expressive Graph Transformer in Linear Time

6 Apr 2024 | Chenhui Deng, Zichao Yue, Zhiru Zhang