Provable Multi-Party Reinforcement Learning with Diverse Human Feedback

Provable Multi-Party Reinforcement Learning with Diverse Human Feedback

8 Mar 2024 | Huiying Zhong, Zhun Deng, Weijie J. Su, Zhiwei Steven Wu, Linjun Zhang
This paper introduces a theoretical framework for multi-party reinforcement learning with diverse human feedback (multi-party RLHF), addressing the limitations of traditional single-party RLHF. The key contributions include: (1) proposing a general framework for aligning with multiple heterogeneous parties using social welfare functions, (2) establishing sample complexity bounds and efficiency/fairness guarantees for optimizing diverse welfare functions like Nash, Utilitarian, and Leximin, and (3) extending the analysis to a reward-free setting with pessimistic variants of the von Neumann winner. The work highlights the statistical complexity of multi-party RLHF compared to single-party RLHF and demonstrates the advantages of multi-party alignment in balancing diverse preferences. Theoretical guarantees are provided for the Nash welfare function, showing that the learned policies are approximately Pareto efficient and satisfy the Pigou-Dalton principle. The results are generalized to Markov Decision Processes (MDPs) and extended to reward-free models, where individual preferences are no longer consistent with a reward model. The paper also compares different social welfare functions, showing that Nash welfare is robust to affine transformations and achieves a balance between average and worst-case performance, while Utilitarian and Leximin require more stringent coverage assumptions. The work provides a comprehensive theoretical foundation for multi-party RLHF, emphasizing the need for more data and careful modeling to handle the increased complexity of aligning with diverse human preferences.This paper introduces a theoretical framework for multi-party reinforcement learning with diverse human feedback (multi-party RLHF), addressing the limitations of traditional single-party RLHF. The key contributions include: (1) proposing a general framework for aligning with multiple heterogeneous parties using social welfare functions, (2) establishing sample complexity bounds and efficiency/fairness guarantees for optimizing diverse welfare functions like Nash, Utilitarian, and Leximin, and (3) extending the analysis to a reward-free setting with pessimistic variants of the von Neumann winner. The work highlights the statistical complexity of multi-party RLHF compared to single-party RLHF and demonstrates the advantages of multi-party alignment in balancing diverse preferences. Theoretical guarantees are provided for the Nash welfare function, showing that the learned policies are approximately Pareto efficient and satisfy the Pigou-Dalton principle. The results are generalized to Markov Decision Processes (MDPs) and extended to reward-free models, where individual preferences are no longer consistent with a reward model. The paper also compares different social welfare functions, showing that Nash welfare is robust to affine transformations and achieves a balance between average and worst-case performance, while Utilitarian and Leximin require more stringent coverage assumptions. The work provides a comprehensive theoretical foundation for multi-party RLHF, emphasizing the need for more data and careful modeling to handle the increased complexity of aligning with diverse human preferences.
Reach us at info@study.space
[slides] Provable Multi-Party Reinforcement Learning with Diverse Human Feedback | StudySpace