Data-efficient Fine-tuning for LLM-based Recommendation

Data-efficient Fine-tuning for LLM-based Recommendation

July 14-18, 2024 | Xinyu Lin, Wenjie Wang, Yongqi Li, Shuo Yang, Fuli Feng, Yinwei Wei, Tat-Seng Chua
This paper proposes a data pruning method for efficient LLM-based recommendation, aiming to identify representative samples for few-shot fine-tuning. The key challenge is to reduce the computational cost of fine-tuning large language models (LLMs) on recommendation data while maintaining performance. Existing coreset selection methods often rely on suboptimal heuristics or costly optimizations, which are not suitable for large-scale recommendation data. To address these issues, the authors introduce two objectives for data pruning: high accuracy (selecting samples that lead to low empirical risk) and high efficiency (low cost of the pruning process). They propose a novel method called DEALRec, which uses influence and effort scores to identify influential samples. The influence score estimates the impact of removing a sample on the empirical risk, while the effort score measures the effort required by LLMs to fit a specific sample. DEALRec is instantiated on two LLM-based recommender models and validated on three real-world datasets. The results show that DEALRec achieves high accuracy with only 2% of the data, reducing training time by 97%. The method is effective in identifying representative samples for LLMs' few-shot fine-tuning, improving the efficiency and performance of LLM-based recommendation systems.This paper proposes a data pruning method for efficient LLM-based recommendation, aiming to identify representative samples for few-shot fine-tuning. The key challenge is to reduce the computational cost of fine-tuning large language models (LLMs) on recommendation data while maintaining performance. Existing coreset selection methods often rely on suboptimal heuristics or costly optimizations, which are not suitable for large-scale recommendation data. To address these issues, the authors introduce two objectives for data pruning: high accuracy (selecting samples that lead to low empirical risk) and high efficiency (low cost of the pruning process). They propose a novel method called DEALRec, which uses influence and effort scores to identify influential samples. The influence score estimates the impact of removing a sample on the empirical risk, while the effort score measures the effort required by LLMs to fit a specific sample. DEALRec is instantiated on two LLM-based recommender models and validated on three real-world datasets. The results show that DEALRec achieves high accuracy with only 2% of the data, reducing training time by 97%. The method is effective in identifying representative samples for LLMs' few-shot fine-tuning, improving the efficiency and performance of LLM-based recommendation systems.
Reach us at info@study.space
[slides] Data-efficient Fine-tuning for LLM-based Recommendation | StudySpace