[slides] Distinguished In Uniform%3A Self Attention Vs. Virtual Nodes

The paper "Distinguished In Uniform: Self-Attention Vs. Virtual Nodes" by Eran Rosenbluth, Jan Tönshoff, Martin Ritzert, Berke Kisin, and Martin Grohe explores the expressivity of Graph Transformers (GTs) and Message-Passing Graph Neural Networks (MPGNNs) with virtual nodes in the context of uniform function approximation. The authors clarify that the universality of GTs, as demonstrated by Kreuzer et al. (2021), is not unique to GTs but also holds for pure MPGNNs and even shallow MLPs when provided with powerful positional encodings. They then investigate uniform expressivity, where a single network must work for graphs of all sizes, and compare GTs to MPGNNs + Virtual Nodes (MPGNN+VN). The main findings are: 1. **Non-Uniform Universality**: Both GTs and MPGNN+VN are non-uniformly universal approximators, meaning they can approximate any function given certain positional encodings. 2. **Uniform Expressivity**: Neither model is a uniform universal approximator. Specifically, no model can uniformly approximate all functions that can be expressed by the other model. 3. **Theoretical Results**: The authors prove that no model can uniformly approximate certain functions, such as the characteristic function of an NP-hard problem on graphs, even with positional encodings. 4. **Empirical Experiments**: Synthetic experiments confirm the theoretical differences between models, showing that MPGNN+VN can learn functions that MPGNNs and GPS (a prototypical Graph Transformer) cannot. 5. **Real-World Datasets**: On real-world datasets, both architectures perform similarly, with no clear ranking in practice. MPGNNs with virtual nodes often achieve state-of-the-art performance on some datasets, highlighting their efficiency and effectiveness. The paper concludes that while both architectures have their strengths, neither subsumes the other in terms of uniform expressivity, and attention mechanisms may not always be necessary for graph learning tasks.The paper "Distinguished In Uniform: Self-Attention Vs. Virtual Nodes" by Eran Rosenbluth, Jan Tönshoff, Martin Ritzert, Berke Kisin, and Martin Grohe explores the expressivity of Graph Transformers (GTs) and Message-Passing Graph Neural Networks (MPGNNs) with virtual nodes in the context of uniform function approximation. The authors clarify that the universality of GTs, as demonstrated by Kreuzer et al. (2021), is not unique to GTs but also holds for pure MPGNNs and even shallow MLPs when provided with powerful positional encodings. They then investigate uniform expressivity, where a single network must work for graphs of all sizes, and compare GTs to MPGNNs + Virtual Nodes (MPGNN+VN). The main findings are: 1. **Non-Uniform Universality**: Both GTs and MPGNN+VN are non-uniformly universal approximators, meaning they can approximate any function given certain positional encodings. 2. **Uniform Expressivity**: Neither model is a uniform universal approximator. Specifically, no model can uniformly approximate all functions that can be expressed by the other model. 3. **Theoretical Results**: The authors prove that no model can uniformly approximate certain functions, such as the characteristic function of an NP-hard problem on graphs, even with positional encodings. 4. **Empirical Experiments**: Synthetic experiments confirm the theoretical differences between models, showing that MPGNN+VN can learn functions that MPGNNs and GPS (a prototypical Graph Transformer) cannot. 5. **Real-World Datasets**: On real-world datasets, both architectures perform similarly, with no clear ranking in practice. MPGNNs with virtual nodes often achieve state-of-the-art performance on some datasets, highlighting their efficiency and effectiveness. The paper concludes that while both architectures have their strengths, neither subsumes the other in terms of uniform expressivity, and attention mechanisms may not always be necessary for graph learning tasks.

Distinguished In Uniform: SELF-ATTENTION VS. VIRTUAL NODES

20 May 2024 | Eran Rosenbluth, Jan Tönshoff, Martin Ritzert, Berke Kisin, Martin Grohe