OpenGraph: Towards Open Graph Foundation Models

OpenGraph: Towards Open Graph Foundation Models

July 2017 | Lianghao Xia, Ben Kao and Chao Huang*
OpenGraph is a graph foundation model designed to excel in zero-shot graph learning tasks across diverse domains. The model addresses key challenges in graph learning, including node token set shifts, efficient node-wise dependency modeling, and domain-specific data scarcity. To overcome these challenges, OpenGraph introduces a unified graph tokenizer that transforms input graphs into unified token sequences, a scalable graph transformer that captures global node-wise dependencies, and a data augmentation mechanism enhanced by large language models (LLMs) to generate synthetic graph data. The graph tokenizer employs a topology-aware projection to generate universal graph tokens, while the scalable graph transformer uses efficient self-attention with anchor sampling to ensure computational efficiency. The data augmentation mechanism leverages LLMs to generate synthetic graphs that closely resemble real-world instances, enhancing the pre-training process. Extensive experiments on various graph datasets demonstrate that OpenGraph achieves exceptional generalization capabilities, surpassing baselines even in few-shot learning scenarios. The model's ability to generalize across different graph domains and datasets highlights its potential as a foundation model for graph learning. The study also evaluates the impact of different pre-training datasets, sampling strategies, and model scales on performance, showing that generated data and appropriate sampling strategies significantly improve model effectiveness. Overall, OpenGraph provides a versatile and scalable solution for graph learning, enabling effective zero-shot learning across diverse applications.OpenGraph is a graph foundation model designed to excel in zero-shot graph learning tasks across diverse domains. The model addresses key challenges in graph learning, including node token set shifts, efficient node-wise dependency modeling, and domain-specific data scarcity. To overcome these challenges, OpenGraph introduces a unified graph tokenizer that transforms input graphs into unified token sequences, a scalable graph transformer that captures global node-wise dependencies, and a data augmentation mechanism enhanced by large language models (LLMs) to generate synthetic graph data. The graph tokenizer employs a topology-aware projection to generate universal graph tokens, while the scalable graph transformer uses efficient self-attention with anchor sampling to ensure computational efficiency. The data augmentation mechanism leverages LLMs to generate synthetic graphs that closely resemble real-world instances, enhancing the pre-training process. Extensive experiments on various graph datasets demonstrate that OpenGraph achieves exceptional generalization capabilities, surpassing baselines even in few-shot learning scenarios. The model's ability to generalize across different graph domains and datasets highlights its potential as a foundation model for graph learning. The study also evaluates the impact of different pre-training datasets, sampling strategies, and model scales on performance, showing that generated data and appropriate sampling strategies significantly improve model effectiveness. Overall, OpenGraph provides a versatile and scalable solution for graph learning, enabling effective zero-shot learning across diverse applications.
Reach us at info@study.space