11 Jul 2024 | Yuhan Li, Peisong Wang, Xiao Zhu, Aochuan Chen, Haiyun Jiang, Deng Cai, Victor Wai Kin Chan, Jia Li
The paper introduces GLBench, a comprehensive benchmark for evaluating GraphLLM methods in both supervised and zero-shot scenarios. GLBench aims to address the lack of a unified and fair evaluation framework in the GraphLLM community. The benchmark includes a wide range of GraphLLM models, such as LLM-as-enhancer, LLM-as-predictor, and LLM-as-aligner, along with traditional graph neural network (GNN) and pre-trained language model (PLM) baselines. Through extensive experiments on real-world datasets, the paper uncovers several key findings:
1. **Supervised Performance**: GraphLLM methods outperform traditional GNN and PLM baselines, with LLM-as-enhancer models showing the most robust performance.
2. **Zero-Shot Transfer**: LLMs demonstrate strong zero-shot capabilities but may suffer from data leakage issues. Both structural and semantic information are crucial for effective zero-shot transfer.
3. **Efficiency**: Current GraphLLM methods generally have higher time and space complexity compared to GNNs, highlighting the need for efficient implementation.
4. **Simple Baseline**: A training-free baseline that combines structural and semantic information can outperform several GraphLLM methods tailored for zero-shot scenarios.
The paper also discusses the limitations of GLBench, such as its focus on node classification tasks and the absence of non-text-attributed graphs. Despite these limitations, GLBench is expected to significantly contribute to the development and understanding of GraphLLM methods.The paper introduces GLBench, a comprehensive benchmark for evaluating GraphLLM methods in both supervised and zero-shot scenarios. GLBench aims to address the lack of a unified and fair evaluation framework in the GraphLLM community. The benchmark includes a wide range of GraphLLM models, such as LLM-as-enhancer, LLM-as-predictor, and LLM-as-aligner, along with traditional graph neural network (GNN) and pre-trained language model (PLM) baselines. Through extensive experiments on real-world datasets, the paper uncovers several key findings:
1. **Supervised Performance**: GraphLLM methods outperform traditional GNN and PLM baselines, with LLM-as-enhancer models showing the most robust performance.
2. **Zero-Shot Transfer**: LLMs demonstrate strong zero-shot capabilities but may suffer from data leakage issues. Both structural and semantic information are crucial for effective zero-shot transfer.
3. **Efficiency**: Current GraphLLM methods generally have higher time and space complexity compared to GNNs, highlighting the need for efficient implementation.
4. **Simple Baseline**: A training-free baseline that combines structural and semantic information can outperform several GraphLLM methods tailored for zero-shot scenarios.
The paper also discusses the limitations of GLBench, such as its focus on node classification tasks and the absence of non-text-attributed graphs. Despite these limitations, GLBench is expected to significantly contribute to the development and understanding of GraphLLM methods.