Understanding Position%3A Graph Foundation Models Are Already Here

Graph Foundation Models (GFM) are emerging as a significant research topic, aiming to develop graph models trained on extensive and diverse data to enhance their applicability across various tasks and domains. The primary challenge in constructing GFM lies in effectively leveraging vast and diverse graph data to achieve positive transfer. Drawing inspiration from existing foundation models in CV and NLP, the paper proposes a novel perspective for GFM development by advocating for a "graph vocabulary," where the basic transferable units underlying graphs encode invariance across graphs. The construction of this vocabulary is grounded in network analysis, expressiveness, and stability, potentially advancing future GFM design in line with neural scaling laws. The paper reviews existing GFM and grounds their effectiveness from a vocabulary view, identifying key challenges and contributions. It categorizes existing GFM into task-specific, domain-specific, and primitive GFM, highlighting the need for a suitable graph vocabulary to achieve successful transferability. The key to successful GFM design is found in finding the invariance across diverse graph data, which is achieved through a suitable graph vocabulary. The paper discusses the effectiveness of existing GFM and the challenges in building a universal GFM, emphasizing the importance of domain/task-specific approaches and the need for proper architecture design and data scaling. The paper explores the transferability principles of graphs, including network analysis, expressiveness, and stability. It provides actionable steps for constructing a suitable graph vocabulary, focusing on node classification, link prediction, and graph classification tasks. The paper also discusses the potential for leveraging large-scale LLMs on graphs, including their application to conventional graph tasks and language-driven graph tasks like Graph Question Answer (GQA). The paper concludes by summarizing the current position of GFM and challenges toward the next step, providing a blueprint for future research. It highlights the potential of GFM in reducing resource consumption and manual annotation, particularly in domains such as molecular property prediction. The paper also discusses broader usage of GFM in reasoning, computer vision, and code intelligence domains, emphasizing the unique value of GFM compared to other foundation models. The authors acknowledge the constructive comments from Yanqiao Zhu and Yuanqi Du, and the support from various organizations and grants. The paper aims to advance the development of next-generation graph foundation models with better versatility and fairness.Graph Foundation Models (GFM) are emerging as a significant research topic, aiming to develop graph models trained on extensive and diverse data to enhance their applicability across various tasks and domains. The primary challenge in constructing GFM lies in effectively leveraging vast and diverse graph data to achieve positive transfer. Drawing inspiration from existing foundation models in CV and NLP, the paper proposes a novel perspective for GFM development by advocating for a "graph vocabulary," where the basic transferable units underlying graphs encode invariance across graphs. The construction of this vocabulary is grounded in network analysis, expressiveness, and stability, potentially advancing future GFM design in line with neural scaling laws. The paper reviews existing GFM and grounds their effectiveness from a vocabulary view, identifying key challenges and contributions. It categorizes existing GFM into task-specific, domain-specific, and primitive GFM, highlighting the need for a suitable graph vocabulary to achieve successful transferability. The key to successful GFM design is found in finding the invariance across diverse graph data, which is achieved through a suitable graph vocabulary. The paper discusses the effectiveness of existing GFM and the challenges in building a universal GFM, emphasizing the importance of domain/task-specific approaches and the need for proper architecture design and data scaling. The paper explores the transferability principles of graphs, including network analysis, expressiveness, and stability. It provides actionable steps for constructing a suitable graph vocabulary, focusing on node classification, link prediction, and graph classification tasks. The paper also discusses the potential for leveraging large-scale LLMs on graphs, including their application to conventional graph tasks and language-driven graph tasks like Graph Question Answer (GQA). The paper concludes by summarizing the current position of GFM and challenges toward the next step, providing a blueprint for future research. It highlights the potential of GFM in reducing resource consumption and manual annotation, particularly in domains such as molecular property prediction. The paper also discusses broader usage of GFM in reasoning, computer vision, and code intelligence domains, emphasizing the unique value of GFM compared to other foundation models. The authors acknowledge the constructive comments from Yanqiao Zhu and Yuanqi Du, and the support from various organizations and grants. The paper aims to advance the development of next-generation graph foundation models with better versatility and fairness.

Position: Graph Foundation Models are Already Here

30 May 2024 | Haitao Mao * 1 Zhikai Chen * 1 Wenzhuo Tang 1 Jianan Zhao 2 3 Yao Ma 4 Tong Zhao 5 Neil Shah 5 Mikhail Galkin 6 Jiliang Tang 1