3 Jul 2024 | Nuo Chen, Yuhan Li, Jianheng Tang, Jia Li
**GraphWiz: An Instruction-Following Language Model for Graph Computational Problems**
**Authors:** Nuo Chen, Yuhan Li, Jianheng Tang, and Jia Li
**Abstract:**
Large language models (LLMs) have achieved significant success in various domains, but their capability in understanding and solving complex graph problems remains underexplored. To address this gap, the authors introduce GraphInstruct, a novel instruction-tuning dataset designed to enable LLMs to tackle a broad spectrum of graph problems through explicit reasoning paths. Using GraphInstruct, they build GraphWiz, an open-source LLM capable of solving various graph computational problems while generating clear reasoning processes. To enhance the model's performance and reliability, they integrate the Direct Preference Optimization (DPO) framework within the graph problem-solving context. The improved model, GraphWiz-DPO, achieves an average accuracy of 65% across nine tasks with different complexity levels, surpassing GPT-4 which has an average accuracy of 43.8%. The study also investigates the relationship between training data volume and model performance, emphasizing the risk of overfitting as data volume increases. Additionally, they explore the transferability of the proposed model across different tasks and datasets, demonstrating its robust zero-shot generalization capability.
**Contributions:**
1. **GraphInstruct Dataset:** A large-scale instruction-tuning dataset specifically designed for training LLMs on various graph computational tasks, enabling models to output explicit reasoning paths and arrive at final answers.
2. **GraphWiz Model:** The first open-source LLM specialized for solving graph problems of various types and scales through explicit reasoning, outperforming current best closed-source models like GPT-4.
3. **Performance Analysis:** Detailed analysis of factors impacting model performance, including training data volume and sampling strategies for dispreferred samples within the DPO framework.
**Keywords:** graph algorithms, large language models, instruction tuning
**CCS Concepts:**
- Computing methodologies → Artificial intelligence
- Mathematics of computing → Graph algorithms
**ACM Reference Format:**
Nuo Chen, Yuhan Li, Jianheng Tang, and Jia Li. 2024. GraphWiz: An Instruction-Following Language Model for Graph Computational Problems. In KDD'24: SIGKDD Conference on Knowledge Discovery and Data Mining, August 25-29, 2024, Barcelona, Spain. ACM, New York, NY, USA, 19 pages. https://doi.org/10.1145/3637528.3672010**GraphWiz: An Instruction-Following Language Model for Graph Computational Problems**
**Authors:** Nuo Chen, Yuhan Li, Jianheng Tang, and Jia Li
**Abstract:**
Large language models (LLMs) have achieved significant success in various domains, but their capability in understanding and solving complex graph problems remains underexplored. To address this gap, the authors introduce GraphInstruct, a novel instruction-tuning dataset designed to enable LLMs to tackle a broad spectrum of graph problems through explicit reasoning paths. Using GraphInstruct, they build GraphWiz, an open-source LLM capable of solving various graph computational problems while generating clear reasoning processes. To enhance the model's performance and reliability, they integrate the Direct Preference Optimization (DPO) framework within the graph problem-solving context. The improved model, GraphWiz-DPO, achieves an average accuracy of 65% across nine tasks with different complexity levels, surpassing GPT-4 which has an average accuracy of 43.8%. The study also investigates the relationship between training data volume and model performance, emphasizing the risk of overfitting as data volume increases. Additionally, they explore the transferability of the proposed model across different tasks and datasets, demonstrating its robust zero-shot generalization capability.
**Contributions:**
1. **GraphInstruct Dataset:** A large-scale instruction-tuning dataset specifically designed for training LLMs on various graph computational tasks, enabling models to output explicit reasoning paths and arrive at final answers.
2. **GraphWiz Model:** The first open-source LLM specialized for solving graph problems of various types and scales through explicit reasoning, outperforming current best closed-source models like GPT-4.
3. **Performance Analysis:** Detailed analysis of factors impacting model performance, including training data volume and sampling strategies for dispreferred samples within the DPO framework.
**Keywords:** graph algorithms, large language models, instruction tuning
**CCS Concepts:**
- Computing methodologies → Artificial intelligence
- Mathematics of computing → Graph algorithms
**ACM Reference Format:**
Nuo Chen, Yuhan Li, Jianheng Tang, and Jia Li. 2024. GraphWiz: An Instruction-Following Language Model for Graph Computational Problems. In KDD'24: SIGKDD Conference on Knowledge Discovery and Data Mining, August 25-29, 2024, Barcelona, Spain. ACM, New York, NY, USA, 19 pages. https://doi.org/10.1145/3637528.3672010