Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment

Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment

2024 | Rui Yang, Xiaoman Pan, Feng Luo, Shuang Qiu, Han Zhong, Dong Yu, Jianshu Chen
Rewards-in-Context (RiC) is a method for multi-objective alignment of foundation models with dynamic preference adjustment. The paper introduces RiC, which conditions the response of a foundation model on multiple rewards in its prompt context and applies supervised fine-tuning for alignment. RiC is simple and adaptive, requiring only supervised fine-tuning of a single foundation model and supporting dynamic adjustment for user preferences during inference. Inspired by the analytical solution of an abstracted convex optimization problem, RiC approaches the Pareto-optimal solution for multiple objectives. Empirical results show that RiC is effective in aligning large language models (LLMs) and diffusion models to accommodate diverse rewards with only around 10% GPU hours compared to multi-objective reinforcement learning (RL) baselines. The paper discusses the challenges of aligning foundation models with human preferences, which is a critical step towards helpful and harmless AI systems. Traditional methods like reinforcement learning from human feedback (RLHF) are costly and unstable, and the multi-dimensionality, heterogeneity, and conflicting nature of human preferences further complicate the alignment process. RiC addresses these challenges by restructuring the multi-objective alignment problem into three stages: (1) an offline training stage that uses multi-reward conditional supervised fine-tuning, (2) an online training stage that improves over the offline stage with augmented data on the Pareto front, and (3) an inference stage that flexibly adapts to different user preferences. The paper also discusses the design of preference-to-reward mappings, which are crucial for dynamic inference-time adjustment. The solution to the optimization problem yields a family of justified mappings that help in realizing non-dominated outcomes for various preferences, thereby establishing an empirical Pareto front. The paper presents a general formulation and derives simplified and practical formulations for the preference-to-reward mappings. The experiments show that RiC outperforms other baselines in terms of achieving a superior empirical front while requiring significantly less computational resources. The results demonstrate that RiC is effective in handling scenarios where prior RLHF algorithms are hindered by the forgetting issue. The paper also discusses the scalability of RiC for handling more than two objectives and its computational efficiency. The results on text generation tasks and text-to-image generation tasks show that RiC achieves better performance than other methods. The paper concludes that RiC is a highly scalable solution for multi-objective alignment of foundation models with dynamic preference adjustment.Rewards-in-Context (RiC) is a method for multi-objective alignment of foundation models with dynamic preference adjustment. The paper introduces RiC, which conditions the response of a foundation model on multiple rewards in its prompt context and applies supervised fine-tuning for alignment. RiC is simple and adaptive, requiring only supervised fine-tuning of a single foundation model and supporting dynamic adjustment for user preferences during inference. Inspired by the analytical solution of an abstracted convex optimization problem, RiC approaches the Pareto-optimal solution for multiple objectives. Empirical results show that RiC is effective in aligning large language models (LLMs) and diffusion models to accommodate diverse rewards with only around 10% GPU hours compared to multi-objective reinforcement learning (RL) baselines. The paper discusses the challenges of aligning foundation models with human preferences, which is a critical step towards helpful and harmless AI systems. Traditional methods like reinforcement learning from human feedback (RLHF) are costly and unstable, and the multi-dimensionality, heterogeneity, and conflicting nature of human preferences further complicate the alignment process. RiC addresses these challenges by restructuring the multi-objective alignment problem into three stages: (1) an offline training stage that uses multi-reward conditional supervised fine-tuning, (2) an online training stage that improves over the offline stage with augmented data on the Pareto front, and (3) an inference stage that flexibly adapts to different user preferences. The paper also discusses the design of preference-to-reward mappings, which are crucial for dynamic inference-time adjustment. The solution to the optimization problem yields a family of justified mappings that help in realizing non-dominated outcomes for various preferences, thereby establishing an empirical Pareto front. The paper presents a general formulation and derives simplified and practical formulations for the preference-to-reward mappings. The experiments show that RiC outperforms other baselines in terms of achieving a superior empirical front while requiring significantly less computational resources. The results demonstrate that RiC is effective in handling scenarios where prior RLHF algorithms are hindered by the forgetting issue. The paper also discusses the scalability of RiC for handling more than two objectives and its computational efficiency. The results on text generation tasks and text-to-image generation tasks show that RiC achieves better performance than other methods. The paper concludes that RiC is a highly scalable solution for multi-objective alignment of foundation models with dynamic preference adjustment.
Reach us at info@study.space