Large Language Models are Learnable Planners for Long-Term Recommendation

Large Language Models are Learnable Planners for Long-Term Recommendation

July 14-18, 2024 | Wentao Shi, Xiangnan He, Yang Zhang, Chongming Gao, Xinyue Li, Jizhi Zhang, Qifan Wang, Fuli Feng
This paper proposes a Bi-level Learnable LLM Planner (BiLLP) framework for long-term recommendation, leveraging the planning capabilities of Large Language Models (LLMs) to enhance long-term user engagement. Traditional methods using Reinforcement Learning (RL) for recommendation face challenges due to sparse data and overfitting, leading to sub-optimal performance. BiLLP addresses these issues by breaking down the learning process into macro-learning and micro-learning. Macro-learning involves a Planner and Reflector to generate high-level plans based on historical experiences, while micro-learning uses an Actor-Critic component to personalize recommendations. The Planner generates plans based on past reflections, and the Actor translates these plans into executable actions. The Critic evaluates the effectiveness of these actions to refine the Actor's policy. The framework is validated through extensive experiments on simulated environments, demonstrating that BiLLP outperforms existing RL-based and LLM-based methods in terms of long-term engagement and cumulative rewards. The results show that BiLLP effectively mitigates the Matthew Effect and filter bubble issues, providing more diverse and engaging recommendations. The framework is also robust across different environments and base LLM models, indicating its adaptability and effectiveness in real-world scenarios. The study highlights the potential of integrating planning capabilities from LLMs into recommendation systems to achieve better long-term user engagement.This paper proposes a Bi-level Learnable LLM Planner (BiLLP) framework for long-term recommendation, leveraging the planning capabilities of Large Language Models (LLMs) to enhance long-term user engagement. Traditional methods using Reinforcement Learning (RL) for recommendation face challenges due to sparse data and overfitting, leading to sub-optimal performance. BiLLP addresses these issues by breaking down the learning process into macro-learning and micro-learning. Macro-learning involves a Planner and Reflector to generate high-level plans based on historical experiences, while micro-learning uses an Actor-Critic component to personalize recommendations. The Planner generates plans based on past reflections, and the Actor translates these plans into executable actions. The Critic evaluates the effectiveness of these actions to refine the Actor's policy. The framework is validated through extensive experiments on simulated environments, demonstrating that BiLLP outperforms existing RL-based and LLM-based methods in terms of long-term engagement and cumulative rewards. The results show that BiLLP effectively mitigates the Matthew Effect and filter bubble issues, providing more diverse and engaging recommendations. The framework is also robust across different environments and base LLM models, indicating its adaptability and effectiveness in real-world scenarios. The study highlights the potential of integrating planning capabilities from LLMs into recommendation systems to achieve better long-term user engagement.
Reach us at info@study.space
[slides and audio] Large Language Models are Learnable Planners for Long-Term Recommendation