Understanding Group Robust Preference Optimization in Reward-free RLHF

The paper introduces a novel method called Group Robust Preference Optimization (GRPO) to align large language models (LLMs) with the preferences of diverse groups robustly. Traditional reinforcement learning with human feedback (RLHF) often assumes a "one-size-fits-all" approach, optimizing a single preference model, which can lead to imbalances and poor performance for minority groups. GRPO addresses this by adaptively weighting the importance of different groups, prioritizing those with worse cumulative loss, and optimizing for the worst-case group performance. The method is theoretically analyzed for convergence and feasibility within the log-linear policy class. Empirical evaluations on synthetic and real-world datasets demonstrate that GRPO significantly improves performance for underperforming groups, reduces loss imbalances, and increases probability accuracies compared to non-robust baselines. The paper also discusses related work and provides detailed experimental results to support its main contributions.The paper introduces a novel method called Group Robust Preference Optimization (GRPO) to align large language models (LLMs) with the preferences of diverse groups robustly. Traditional reinforcement learning with human feedback (RLHF) often assumes a "one-size-fits-all" approach, optimizing a single preference model, which can lead to imbalances and poor performance for minority groups. GRPO addresses this by adaptively weighting the importance of different groups, prioritizing those with worse cumulative loss, and optimizing for the worst-case group performance. The method is theoretically analyzed for convergence and feasibility within the log-linear policy class. Empirical evaluations on synthetic and real-world datasets demonstrate that GRPO significantly improves performance for underperforming groups, reduces loss imbalances, and increases probability accuracies compared to non-robust baselines. The paper also discusses related work and provides detailed experimental results to support its main contributions.

Group Robust Preference Optimization in Reward-free RLHF

May 31, 2024 | Shyam Sundhar Ramesh, Yifan Hu, Iason Chaimalas, Viraj Mehta, Pier Giuseppe Sessa, Haitham Bou Ammar, Ilija Bogunovic