Understanding Rethinking Large Language Model Architectures for Sequential Recommendations

This paper addresses the computational overhead of inference in large language model (LLM)-based sequential recommendation systems, which hinders their real-world applicability. The authors propose LTE-LLM4Rec, a streamlined and efficient model designed to improve both performance and inference speed. LTE-LLM4Rec avoids beam search decoding by using a simple item projection head for ranking scores generation, reducing computational complexity. Additionally, it introduces a hierarchical LLM structure to handle extensive contextual information efficiently, thereby reducing computational overhead while leveraging LLM capabilities. Experiments on three public datasets show that LTE-LLM4Rec achieves significant improvements in both performance (46.8% improvement) and inference efficiency (97.28% improvement) compared to existing LLM-based methods. The paper also discusses the effectiveness of the proposed model and its components, providing insights into the impact of different design choices on recommendation performance.This paper addresses the computational overhead of inference in large language model (LLM)-based sequential recommendation systems, which hinders their real-world applicability. The authors propose LTE-LLM4Rec, a streamlined and efficient model designed to improve both performance and inference speed. LTE-LLM4Rec avoids beam search decoding by using a simple item projection head for ranking scores generation, reducing computational complexity. Additionally, it introduces a hierarchical LLM structure to handle extensive contextual information efficiently, thereby reducing computational overhead while leveraging LLM capabilities. Experiments on three public datasets show that LTE-LLM4Rec achieves significant improvements in both performance (46.8% improvement) and inference efficiency (97.28% improvement) compared to existing LLM-based methods. The paper also discusses the effectiveness of the proposed model and its components, providing insights into the impact of different design choices on recommendation performance.

Rethinking Large Language Model Architectures for Sequential Recommendations

14 Feb 2024 | Hanbing Wang1, Xiaorui Liu3, Wenqi Fan4, Xiangyu Zhao5, Venkataramana Kini2, Devendra Yadav2, Fei Wang2, Zhen Wen2, Jiliang Tang1, Hui Liu1