Understanding GLBench%3A A Comprehensive Benchmark for Graph with Large Language Models

The paper introduces GLBench, a comprehensive benchmark for evaluating GraphLLM methods in both supervised and zero-shot scenarios. GLBench aims to address the lack of a unified and fair evaluation framework in the GraphLLM community. The benchmark includes a wide range of GraphLLM models, such as LLM-as-enhancer, LLM-as-predictor, and LLM-as-aligner, along with traditional graph neural network (GNN) and pre-trained language model (PLM) baselines. Through extensive experiments on real-world datasets, the paper uncovers several key findings: 1. **Supervised Performance**: GraphLLM methods outperform traditional GNN and PLM baselines, with LLM-as-enhancer models showing the most robust performance. 2. **Zero-Shot Transfer**: LLMs demonstrate strong zero-shot capabilities but may suffer from data leakage issues. Both structural and semantic information are crucial for effective zero-shot transfer. 3. **Efficiency**: Current GraphLLM methods generally have higher time and space complexity compared to GNNs, highlighting the need for efficient implementation. 4. **Simple Baseline**: A training-free baseline that combines structural and semantic information can outperform several GraphLLM methods tailored for zero-shot scenarios. The paper also discusses the limitations of GLBench, such as its focus on node classification tasks and the absence of non-text-attributed graphs. Despite these limitations, GLBench is expected to significantly contribute to the development and understanding of GraphLLM methods.The paper introduces GLBench, a comprehensive benchmark for evaluating GraphLLM methods in both supervised and zero-shot scenarios. GLBench aims to address the lack of a unified and fair evaluation framework in the GraphLLM community. The benchmark includes a wide range of GraphLLM models, such as LLM-as-enhancer, LLM-as-predictor, and LLM-as-aligner, along with traditional graph neural network (GNN) and pre-trained language model (PLM) baselines. Through extensive experiments on real-world datasets, the paper uncovers several key findings: 1. **Supervised Performance**: GraphLLM methods outperform traditional GNN and PLM baselines, with LLM-as-enhancer models showing the most robust performance. 2. **Zero-Shot Transfer**: LLMs demonstrate strong zero-shot capabilities but may suffer from data leakage issues. Both structural and semantic information are crucial for effective zero-shot transfer. 3. **Efficiency**: Current GraphLLM methods generally have higher time and space complexity compared to GNNs, highlighting the need for efficient implementation. 4. **Simple Baseline**: A training-free baseline that combines structural and semantic information can outperform several GraphLLM methods tailored for zero-shot scenarios. The paper also discusses the limitations of GLBench, such as its focus on node classification tasks and the absence of non-text-attributed graphs. Despite these limitations, GLBench is expected to significantly contribute to the development and understanding of GraphLLM methods.

GLBench: A Comprehensive Benchmark for Graph with Large Language Models

11 Jul 2024 | Yuhan Li, Peisong Wang, Xiao Zhu, Aochuan Chen, Haiyun Jiang, Deng Cai, Victor Wai Kin Chan, Jia Li