May 13-17, 2024 | Jun Hu, Wenwen Xia, Xiaolu Zhang, Chilin Fu, Weichang Wu, Zhaoxin Huan, Ang Li, Zuoli Tang, Jun Zhou
This paper introduces SAID, a framework that enhances sequential recommendation (SRS) by leveraging large language models (LLMs) to learn semantically aligned item embeddings. SAID addresses the limitations of existing methods by explicitly learning embeddings that preserve fine-grained semantic information from item texts. The framework consists of two stages: (1) semantically aligned embedding learning, where item IDs are transformed into embeddings that align with their textual descriptions using an LLM, and (2) model-agnostic sequential recommender training, where these embeddings are used with lightweight downstream models for recommendation. SAID avoids the inefficiencies of previous methods by not relying on long token sequences, resulting in faster inference and improved performance. Experiments on six public datasets show that SAID outperforms baselines by up to 15% in NDCG@10 and 14% in Recall@10. Additionally, SAID achieves a 3.07% improvement in cost per mille (CPM) in Alipay's online advertising platform. The framework is efficient, with an online response time under 20 milliseconds, making it suitable for industrial applications. SAID's approach demonstrates the effectiveness of LLM-based semantic embedding learning in improving SRS performance.This paper introduces SAID, a framework that enhances sequential recommendation (SRS) by leveraging large language models (LLMs) to learn semantically aligned item embeddings. SAID addresses the limitations of existing methods by explicitly learning embeddings that preserve fine-grained semantic information from item texts. The framework consists of two stages: (1) semantically aligned embedding learning, where item IDs are transformed into embeddings that align with their textual descriptions using an LLM, and (2) model-agnostic sequential recommender training, where these embeddings are used with lightweight downstream models for recommendation. SAID avoids the inefficiencies of previous methods by not relying on long token sequences, resulting in faster inference and improved performance. Experiments on six public datasets show that SAID outperforms baselines by up to 15% in NDCG@10 and 14% in Recall@10. Additionally, SAID achieves a 3.07% improvement in cost per mille (CPM) in Alipay's online advertising platform. The framework is efficient, with an online response time under 20 milliseconds, making it suitable for industrial applications. SAID's approach demonstrates the effectiveness of LLM-based semantic embedding learning in improving SRS performance.