GLBench: A Comprehensive Benchmark for Graph with Large Language Models

GLBench: A Comprehensive Benchmark for Graph with Large Language Models

11 Jul 2024 | Yuhan Li, Peisong Wang, Xiao Zhu, Aochuan Chen, Haiyun Jiang, Deng Cai, Victor Wai Kin Chan, Jia Li
GLBench is a comprehensive benchmark for evaluating GraphLLM methods in both supervised and zero-shot scenarios. The paper introduces GLBench, the first benchmark that provides a fair and thorough evaluation of various GraphLLM methods, including traditional baselines like graph neural networks (GNNs) and pre-trained language models (PLMs). Through extensive experiments on real-world datasets with consistent data processing and splitting strategies, the authors uncover several key findings. GraphLLM methods outperform traditional baselines in supervised settings, with LLM-as-enhancers showing the most robust performance. However, using LLMs as predictors is less effective and often leads to uncontrollable output issues. No clear scaling laws exist for current GraphLLM methods. Both structures and semantics are crucial for effective zero-shot transfer, and the proposed simple baseline can even outperform several models tailored for zero-shot scenarios. The data and code of the benchmark are available at https://github.com/NineAbyss/GLBench. GLBench supports two learning scenarios: supervised learning and zero-shot learning. Supervised learning aims to train GraphLLM models to predict unlabeled nodes with the same label space as the training set. Zero-shot learning involves training GraphLLM models on labeled source graphs and generating satisfactory predictions on completely different target graphs with distinct label spaces. The paper evaluates various GraphLLM methods, including LLM-as-enhancer, LLM-as-predictor, and LLM-as-aligner approaches. The results show that GraphLLM methods achieve superior performance across most datasets, with LLM-as-enhancers demonstrating the most robust performance. However, LLM-as-predictor methods often encounter uncontrollable output issues. The performance of GraphLLM methods does not clearly scale with model size, and both structures and semantics are crucial for zero-shot transfer. A simple baseline can even outperform several existing GraphLLM methods tailored for zero-shot scenarios. The paper also highlights the efficiency problem of existing GraphLLM methods, emphasizing the need for efficient and effective GraphLLM methods for zero-shot graph learning. GLBench provides a comprehensive benchmark for the emerging GraphLLM paradigm, enabling fair comparisons among different categories of methods. The authors make three key contributions: introducing GLBench as the first comprehensive benchmark for GraphLLM methods, conducting a systematic analysis of existing methods from various dimensions, and making the benchmark repository publicly available for future research.GLBench is a comprehensive benchmark for evaluating GraphLLM methods in both supervised and zero-shot scenarios. The paper introduces GLBench, the first benchmark that provides a fair and thorough evaluation of various GraphLLM methods, including traditional baselines like graph neural networks (GNNs) and pre-trained language models (PLMs). Through extensive experiments on real-world datasets with consistent data processing and splitting strategies, the authors uncover several key findings. GraphLLM methods outperform traditional baselines in supervised settings, with LLM-as-enhancers showing the most robust performance. However, using LLMs as predictors is less effective and often leads to uncontrollable output issues. No clear scaling laws exist for current GraphLLM methods. Both structures and semantics are crucial for effective zero-shot transfer, and the proposed simple baseline can even outperform several models tailored for zero-shot scenarios. The data and code of the benchmark are available at https://github.com/NineAbyss/GLBench. GLBench supports two learning scenarios: supervised learning and zero-shot learning. Supervised learning aims to train GraphLLM models to predict unlabeled nodes with the same label space as the training set. Zero-shot learning involves training GraphLLM models on labeled source graphs and generating satisfactory predictions on completely different target graphs with distinct label spaces. The paper evaluates various GraphLLM methods, including LLM-as-enhancer, LLM-as-predictor, and LLM-as-aligner approaches. The results show that GraphLLM methods achieve superior performance across most datasets, with LLM-as-enhancers demonstrating the most robust performance. However, LLM-as-predictor methods often encounter uncontrollable output issues. The performance of GraphLLM methods does not clearly scale with model size, and both structures and semantics are crucial for zero-shot transfer. A simple baseline can even outperform several existing GraphLLM methods tailored for zero-shot scenarios. The paper also highlights the efficiency problem of existing GraphLLM methods, emphasizing the need for efficient and effective GraphLLM methods for zero-shot graph learning. GLBench provides a comprehensive benchmark for the emerging GraphLLM paradigm, enabling fair comparisons among different categories of methods. The authors make three key contributions: introducing GLBench as the first comprehensive benchmark for GraphLLM methods, conducting a systematic analysis of existing methods from various dimensions, and making the benchmark repository publicly available for future research.
Reach us at info@study.space