Understanding Graph-Mamba%3A Towards Long-Range Graph Sequence Modeling with Selective State Spaces

Graph-Mamba is a novel graph model that integrates a selective state space model (SSM) to achieve efficient data-dependent context selection, serving as an alternative to graph attention sparsification. The model introduces a Graph-Mamba block (GMB) that leverages the recurrent scan in sequence modeling with a selection mechanism to achieve two levels of graph sparsification. The first level involves the selection mechanism in the Mamba module, which effectively filters relevant information within the long-range context. The second level is achieved through the proposed node prioritization approach, allowing important nodes in the graph to access more context. Consequently, these sequence modeling features present a promising avenue of combining data-dependent and heuristic-informed selection for graph sparsification. Moreover, Graph-Mamba implementation using the Mamba module ensures linear-time complexity, as an efficient alternative to dense graph attention. Graph-Mamba is designed to adapt sequence modeling techniques, such as Mamba, to non-sequential graph input. It incorporates a node prioritization strategy and permutation-based training and inference recipe to promote permutation invariance. The node prioritization strategy uses node heuristics, such as node degree, to sort nodes in the input sequence, allowing more important nodes to be placed at the end of the sequence for informed sparsification. The permutation-based training and inference recipe randomly shuffles nodes of the same degree during training to minimize bias towards any particular order. At inference time, the outputs from multiple GMB layers are averaged to ensure stability and permutation invariance. Extensive experiments on ten benchmark datasets demonstrate that Graph-Mamba outperforms state-of-the-art methods in long-range graph prediction tasks, with a fraction of the computational cost in both FLOPs and GPU memory consumption. The model achieves linear-time computational complexity and reduces GPU memory consumption by up to 74% on large graphs. Graph-Mamba's performance is comparable to other sparse and dense attention methods on smaller graphs, showcasing its robustness and generalizability to common graph-based prediction tasks. The model's efficiency and effectiveness make it a promising alternative to traditional dense or sparse graph attention, offering competitive predictive power and long-range context awareness at a fraction of the computational cost.Graph-Mamba is a novel graph model that integrates a selective state space model (SSM) to achieve efficient data-dependent context selection, serving as an alternative to graph attention sparsification. The model introduces a Graph-Mamba block (GMB) that leverages the recurrent scan in sequence modeling with a selection mechanism to achieve two levels of graph sparsification. The first level involves the selection mechanism in the Mamba module, which effectively filters relevant information within the long-range context. The second level is achieved through the proposed node prioritization approach, allowing important nodes in the graph to access more context. Consequently, these sequence modeling features present a promising avenue of combining data-dependent and heuristic-informed selection for graph sparsification. Moreover, Graph-Mamba implementation using the Mamba module ensures linear-time complexity, as an efficient alternative to dense graph attention. Graph-Mamba is designed to adapt sequence modeling techniques, such as Mamba, to non-sequential graph input. It incorporates a node prioritization strategy and permutation-based training and inference recipe to promote permutation invariance. The node prioritization strategy uses node heuristics, such as node degree, to sort nodes in the input sequence, allowing more important nodes to be placed at the end of the sequence for informed sparsification. The permutation-based training and inference recipe randomly shuffles nodes of the same degree during training to minimize bias towards any particular order. At inference time, the outputs from multiple GMB layers are averaged to ensure stability and permutation invariance. Extensive experiments on ten benchmark datasets demonstrate that Graph-Mamba outperforms state-of-the-art methods in long-range graph prediction tasks, with a fraction of the computational cost in both FLOPs and GPU memory consumption. The model achieves linear-time computational complexity and reduces GPU memory consumption by up to 74% on large graphs. Graph-Mamba's performance is comparable to other sparse and dense attention methods on smaller graphs, showcasing its robustness and generalizability to common graph-based prediction tasks. The model's efficiency and effectiveness make it a promising alternative to traditional dense or sparse graph attention, offering competitive predictive power and long-range context awareness at a fraction of the computational cost.

Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces

1 Feb 2024 | Chloe Wang, Oleksii Tsepa, Jun Ma, Bo Wang