Distinguished In Uniform: Self-Attention Vs. Virtual Nodes

Distinguished In Uniform: Self-Attention Vs. Virtual Nodes

20 May 2024 | Eran Rosenbluth, Jan Tönshoff, Martin Ritzert, Berke Kisin & Martin Grohe
The paper compares the expressivity of Graph Transformers (GTs) and Message-Passing Graph Neural Networks (MPGNNs) with Virtual Nodes (VNs). It shows that both architectures are not universal approximators in the uniform setting, where a single model must work for all graph sizes. The key difference between GTs and MPGNN+VNs lies in their global computation methods: GTs use self-attention, while MPGNN+VNs use virtual nodes. The paper proves that neither model's uniform expressivity subsumes the other. It also demonstrates that GTs can perform tasks requiring unbounded aggregation, while MPGNN+VNs with virtual nodes can sum over all nodes. The study includes experiments on synthetic and real-world datasets, showing that neither architecture consistently outperforms the other. The results highlight that both models have distinct strengths and weaknesses in terms of expressivity and efficiency.The paper compares the expressivity of Graph Transformers (GTs) and Message-Passing Graph Neural Networks (MPGNNs) with Virtual Nodes (VNs). It shows that both architectures are not universal approximators in the uniform setting, where a single model must work for all graph sizes. The key difference between GTs and MPGNN+VNs lies in their global computation methods: GTs use self-attention, while MPGNN+VNs use virtual nodes. The paper proves that neither model's uniform expressivity subsumes the other. It also demonstrates that GTs can perform tasks requiring unbounded aggregation, while MPGNN+VNs with virtual nodes can sum over all nodes. The study includes experiments on synthetic and real-world datasets, showing that neither architecture consistently outperforms the other. The results highlight that both models have distinct strengths and weaknesses in terms of expressivity and efficiency.
Reach us at info@study.space