PAL: PLURALISTIC ALIGNMENT FRAMEWORK FOR LEARNING FROM HETEROGENEOUS PREFERENCES

PAL: PLURALISTIC ALIGNMENT FRAMEWORK FOR LEARNING FROM HETEROGENEOUS PREFERENCES

12 Jun 2024 | Daiwei Chen, Yi Chen, Aniket Rege, Ramya Korlakai Vinayak
The paper introduces the Pluralistic Alignment Framework (PAL), a novel approach to aligning large foundation models with human preferences. Traditional methods often assume a universal preference shared by all humans, which fails to capture the diversity and heterogeneity of human preferences. PAL addresses this by incorporating plurality from the ground up, using the ideal point model to view alignment through a lens of preference comparisons. The framework employs mixture modeling to capture the plurality of population preferences while learning a common preference latent space across different preferences. This enables the use of the penultimate-layer representation of large foundation models and simple MLP layers to learn reward functions that are competitive with state-of-the-art (SoTA) models, achieving significant efficiency gains. Experiments on synthetic, semi-synthetic, and real datasets demonstrate PAL's effectiveness in learning from diverse preferences, outperforming existing homogeneous reward models. The paper also highlights the limitations of current preference datasets, which are often created using rigid rubrics that limit heterogeneity, and calls for more nuanced data collection approaches.The paper introduces the Pluralistic Alignment Framework (PAL), a novel approach to aligning large foundation models with human preferences. Traditional methods often assume a universal preference shared by all humans, which fails to capture the diversity and heterogeneity of human preferences. PAL addresses this by incorporating plurality from the ground up, using the ideal point model to view alignment through a lens of preference comparisons. The framework employs mixture modeling to capture the plurality of population preferences while learning a common preference latent space across different preferences. This enables the use of the penultimate-layer representation of large foundation models and simple MLP layers to learn reward functions that are competitive with state-of-the-art (SoTA) models, achieving significant efficiency gains. Experiments on synthetic, semi-synthetic, and real datasets demonstrate PAL's effectiveness in learning from diverse preferences, outperforming existing homogeneous reward models. The paper also highlights the limitations of current preference datasets, which are often created using rigid rubrics that limit heterogeneity, and calls for more nuanced data collection approaches.
Reach us at info@study.space
[slides and audio] PAL%3A Pluralistic Alignment Framework for Learning from Heterogeneous Preferences