Efficient Tuning and Inference for Large Language Models on Textual Graphs

Efficient Tuning and Inference for Large Language Models on Textual Graphs

24 Jul 2024 | Yun Zhu, Yaoke Wang, Haizhou Shi, Siliang Tang
The paper "Efficient Tuning and Inference for Large Language Models on Textual Graphs" by Yun Zhu, Yaoke Wang, Haizhou Shi, and Siliang Tang proposes ENGINE, an efficient and effective framework for integrating large language models (LLMs) with textual graphs. The key insight is to combine LLMs and graph neural networks (GNNs) through a tunable side structure, significantly reducing training complexity without compromising the joint model's capacity. ENGINE introduces a lightweight GNN-based side structure (G-Ladder) alongside each layer of the LLM, allowing for the explicit modeling of structural information in textual graphs. The method is designed to be parameter-efficient and memory-efficient, with precomputed node embeddings stored in a cache to reduce training time. Additionally, ENGINE incorporates dynamic early exit to accelerate inference, achieving up to 5x faster inference with minimal performance loss. Extensive experiments on various textual graph datasets demonstrate ENGINE's superior performance and efficiency compared to existing methods. The paper also includes sensitivity analysis and an ablation study to validate the effectiveness of the proposed components.The paper "Efficient Tuning and Inference for Large Language Models on Textual Graphs" by Yun Zhu, Yaoke Wang, Haizhou Shi, and Siliang Tang proposes ENGINE, an efficient and effective framework for integrating large language models (LLMs) with textual graphs. The key insight is to combine LLMs and graph neural networks (GNNs) through a tunable side structure, significantly reducing training complexity without compromising the joint model's capacity. ENGINE introduces a lightweight GNN-based side structure (G-Ladder) alongside each layer of the LLM, allowing for the explicit modeling of structural information in textual graphs. The method is designed to be parameter-efficient and memory-efficient, with precomputed node embeddings stored in a cache to reduce training time. Additionally, ENGINE incorporates dynamic early exit to accelerate inference, achieving up to 5x faster inference with minimal performance loss. Extensive experiments on various textual graph datasets demonstrate ENGINE's superior performance and efficiency compared to existing methods. The paper also includes sensitivity analysis and an ablation study to validate the effectiveness of the proposed components.
Reach us at info@study.space