Understanding Biases in ChatGPT-based Recommender Systems: Provider Fairness, Temporal Stability, and Recency

Understanding Biases in ChatGPT-based Recommender Systems: Provider Fairness, Temporal Stability, and Recency

1 (July 2024) | YASHAR DELDJOO
This paper investigates biases in ChatGPT-based recommender systems, focusing on provider fairness, temporal stability, and recency. The study explores how prompt design strategies—such as structure, system role, and intent—affect evaluation metrics like provider fairness, catalog coverage, temporal stability, and recency. Two experiments were conducted: the first examined classical top-K recommendations, while the second evaluated sequential in-context learning (ICL). In the first experiment, seven prompt scenarios were tested, including accuracy-oriented, beyond-accuracy, and reasoning-focused prompts. Accuracy-oriented prompts, such as Simple and Chain-of-Thought (COT), outperformed diversification prompts, which reduced accuracy by up to 50%. Embedding fairness into system roles, such as "act as a fair recommender," proved more effective than fairness directives within prompts. Diversification prompts led to recommending newer movies with broader genre distribution compared to traditional collaborative filtering (CF) models. The second experiment compared zero-shot and few-shot learning scenarios in sequential ICL. Results showed that including user demographic information in prompts affected model biases and stereotypes. Zero-shot learning achieved higher NDCG and coverage, while ICL-2 showed slight improvements in hit rate (HR) when age-group context was included. The study highlights the potential and challenges of integrating large language models (LLMs) into recommendation systems, emphasizing the need for fair and diverse recommendations. Key contributions include an in-depth analysis of prompt design, system roles, and stability over time in ChatGPT-based RecLLMs. The study identifies biases in RecLLMs, such as item fairness, genre preference bias, and temporal bias, and compares them with traditional CF models. It also explores the impact of user profile attributes and demographic information on fairness and accuracy. The findings provide insights into the biases of RecLLMs, particularly in provider fairness and catalog coverage, and highlight the importance of prompt design and learning strategies in enhancing item fairness without compromising accuracy. The research underscores the potential and challenges of integrating LLMs into recommendation systems, paving the way for future research.This paper investigates biases in ChatGPT-based recommender systems, focusing on provider fairness, temporal stability, and recency. The study explores how prompt design strategies—such as structure, system role, and intent—affect evaluation metrics like provider fairness, catalog coverage, temporal stability, and recency. Two experiments were conducted: the first examined classical top-K recommendations, while the second evaluated sequential in-context learning (ICL). In the first experiment, seven prompt scenarios were tested, including accuracy-oriented, beyond-accuracy, and reasoning-focused prompts. Accuracy-oriented prompts, such as Simple and Chain-of-Thought (COT), outperformed diversification prompts, which reduced accuracy by up to 50%. Embedding fairness into system roles, such as "act as a fair recommender," proved more effective than fairness directives within prompts. Diversification prompts led to recommending newer movies with broader genre distribution compared to traditional collaborative filtering (CF) models. The second experiment compared zero-shot and few-shot learning scenarios in sequential ICL. Results showed that including user demographic information in prompts affected model biases and stereotypes. Zero-shot learning achieved higher NDCG and coverage, while ICL-2 showed slight improvements in hit rate (HR) when age-group context was included. The study highlights the potential and challenges of integrating large language models (LLMs) into recommendation systems, emphasizing the need for fair and diverse recommendations. Key contributions include an in-depth analysis of prompt design, system roles, and stability over time in ChatGPT-based RecLLMs. The study identifies biases in RecLLMs, such as item fairness, genre preference bias, and temporal bias, and compares them with traditional CF models. It also explores the impact of user profile attributes and demographic information on fairness and accuracy. The findings provide insights into the biases of RecLLMs, particularly in provider fairness and catalog coverage, and highlight the importance of prompt design and learning strategies in enhancing item fairness without compromising accuracy. The research underscores the potential and challenges of integrating LLMs into recommendation systems, paving the way for future research.
Reach us at info@study.space
Understanding Understanding Biases in ChatGPT-based Recommender Systems%3A Provider Fairness%2C Temporal Stability%2C and Recency