Fine-Tuning Large Language Model Based Explainable Recommendation with Explainable Quality Reward

Fine-Tuning Large Language Model Based Explainable Recommendation with Explainable Quality Reward

2024 | Mengyuan Yang, Mengying Zhu, Yan Wang, Linxun Chen, Yilei Zhao, Xiuyuan Wang, Bing Han, Xiaolin Zheng, Jianwei Yin
The paper addresses the low-quality issues in explainable recommendation systems based on large language models (LLMs), such as lack of personalization, inconsistency, and questionable data quality. To tackle these problems, the authors propose a novel LLM-based explainable recommendation model named LLM2ER, which is fine-tuned using two innovative explainable quality reward models (EQR) in a reinforcement learning (RL) paradigm. The fine-tuned model, denoted as LLM2ER-EQR, aims to generate personalized, informative, and consistent high-quality explanations. The EQR models include a concept consistent reward model (CCR) and a high-quality alignment reward model (HQAR). CCR leverages sentiment-wise candidate concepts to preserve user preferences and item features in generated explanations, while HQAR aligns generated explanations with unpaired high-quality explanations using a generative adversarial network (GAN). Extensive experiments on three real-world datasets demonstrate that LLM2ER-EQR outperforms state-of-the-art methods in generating fluent, diverse, informative, and highly personalized explanations. The paper also discusses the limitations and future work, including the ethical considerations of using pre-trained LLMs.The paper addresses the low-quality issues in explainable recommendation systems based on large language models (LLMs), such as lack of personalization, inconsistency, and questionable data quality. To tackle these problems, the authors propose a novel LLM-based explainable recommendation model named LLM2ER, which is fine-tuned using two innovative explainable quality reward models (EQR) in a reinforcement learning (RL) paradigm. The fine-tuned model, denoted as LLM2ER-EQR, aims to generate personalized, informative, and consistent high-quality explanations. The EQR models include a concept consistent reward model (CCR) and a high-quality alignment reward model (HQAR). CCR leverages sentiment-wise candidate concepts to preserve user preferences and item features in generated explanations, while HQAR aligns generated explanations with unpaired high-quality explanations using a generative adversarial network (GAN). Extensive experiments on three real-world datasets demonstrate that LLM2ER-EQR outperforms state-of-the-art methods in generating fluent, diverse, informative, and highly personalized explanations. The paper also discusses the limitations and future work, including the ethical considerations of using pre-trained LLMs.
Reach us at info@study.space
[slides] Fine-Tuning Large Language Model Based Explainable Recommendation with Explainable Quality Reward | StudySpace