[slides] Enhancing Recommendation Diversity by Re-ranking with Large Language Models

This paper explores the use of Large Language Models (LLMs) to enhance recommendation diversity through re-ranking. The authors begin with an informal study to verify that LLMs can perform re-ranking tasks and understand the concept of item diversity. They then design a rigorous methodology where LLMs are prompted to generate diverse rankings from candidate rankings using various prompt templates. Experiments are conducted on anime movie and book datasets, using state-of-the-art LLMs from the GPT and Llama families. The results show that LLM-based re-rankers can interpret the re-ranking task, improving diversity while sacrificing some relevance. While LLM-based re-rankers outperform random re-rankers, they are still inferior to traditional re-rankers in terms of relevance-aware metrics. OpenAI's models, particularly ChatGPT, perform better than Meta's models like Llama2-13B-Chat. The effectiveness of different prompt templates varies, indicating the need for tailored designs. Traditional greedy approaches are faster and less resource-intensive, making them more practical. However, the study shows that LLM-based re-ranking has significant promise and will become more competitive as LLMs improve and costs decrease.This paper explores the use of Large Language Models (LLMs) to enhance recommendation diversity through re-ranking. The authors begin with an informal study to verify that LLMs can perform re-ranking tasks and understand the concept of item diversity. They then design a rigorous methodology where LLMs are prompted to generate diverse rankings from candidate rankings using various prompt templates. Experiments are conducted on anime movie and book datasets, using state-of-the-art LLMs from the GPT and Llama families. The results show that LLM-based re-rankers can interpret the re-ranking task, improving diversity while sacrificing some relevance. While LLM-based re-rankers outperform random re-rankers, they are still inferior to traditional re-rankers in terms of relevance-aware metrics. OpenAI's models, particularly ChatGPT, perform better than Meta's models like Llama2-13B-Chat. The effectiveness of different prompt templates varies, indicating the need for tailored designs. Traditional greedy approaches are faster and less resource-intensive, making them more practical. However, the study shows that LLM-based re-ranking has significant promise and will become more competitive as LLMs improve and costs decrease.

Enhancing Recommendation Diversity by Re-ranking with Large Language Models

17 Jun 2024 | DIEGO CARRARO, DEREK BRIDGE