Aligning to Thousands of Preferences via System Message Generalization

Aligning to Thousands of Preferences via System Message Generalization

28 May 2024 | Seongyun Lee, Sue Hyun Park, Seungone Kim, Minjoon Seo
This paper introduces a novel approach to align large language models (LLMs) with diverse user preferences without requiring continual retraining for each individual preference. The key idea is to train LLMs using a variety of system messages that reflect different user preferences, enabling the model to generalize to unseen preferences. To achieve this, the authors create the MULTIFACETED COLLECTION, a preference dataset containing 192,000 unique system messages and 65,000 user instructions, each accompanied by three system messages and corresponding responses. This dataset is generated by combining preferences from multiple dimensions and using GPT-4 Turbo to create system messages and reference answers. The authors train a 7B LLM called JANUS on the MULTIFACETED COLLECTION and evaluate its performance on five benchmarks: AlpacaEval 2.0, FLASK, Koala, MT-Bench, and Self-Instruct. JANUS achieves a tie+win rate of 75.2%, 72.4%, and 66.4% against Mistral 7B Instruct v0.2, GPT-3.5 Turbo, and GPT-4, respectively. Additionally, on three benchmarks focused on response helpfulness (AlpacaEval 2.0, MT-Bench, Arena Hard Auto v0.1), JANUS outperforms LLaMA 3 8B Instruct by a +4.0%, +0.1%, and +3.0% margin, demonstrating that training with a vast array of system messages can also enhance alignment with the general public's preferences. The authors also show that JANUS can generate diverse and safe responses, with lower toxicity compared to other models. They conduct various ablation studies and analyses to examine the behavior of JANUS, demonstrating that it exhibits system prompt generalization and benefits from incorporating multifaceted system messages. Furthermore, JANUS can be an efficient personalized reward model, aligning with diverse values and providing tailored rewards to meet individual user requests. The results indicate that JANUS not only retains its capability for generalization across system messages and diversity but also maintains harmlessness. The authors believe their work contributes to developing AI systems that respect the values of both the majority and individual user preferences. The code, dataset, benchmark, and models are available at https://github.com/kaistAI/Janus.This paper introduces a novel approach to align large language models (LLMs) with diverse user preferences without requiring continual retraining for each individual preference. The key idea is to train LLMs using a variety of system messages that reflect different user preferences, enabling the model to generalize to unseen preferences. To achieve this, the authors create the MULTIFACETED COLLECTION, a preference dataset containing 192,000 unique system messages and 65,000 user instructions, each accompanied by three system messages and corresponding responses. This dataset is generated by combining preferences from multiple dimensions and using GPT-4 Turbo to create system messages and reference answers. The authors train a 7B LLM called JANUS on the MULTIFACETED COLLECTION and evaluate its performance on five benchmarks: AlpacaEval 2.0, FLASK, Koala, MT-Bench, and Self-Instruct. JANUS achieves a tie+win rate of 75.2%, 72.4%, and 66.4% against Mistral 7B Instruct v0.2, GPT-3.5 Turbo, and GPT-4, respectively. Additionally, on three benchmarks focused on response helpfulness (AlpacaEval 2.0, MT-Bench, Arena Hard Auto v0.1), JANUS outperforms LLaMA 3 8B Instruct by a +4.0%, +0.1%, and +3.0% margin, demonstrating that training with a vast array of system messages can also enhance alignment with the general public's preferences. The authors also show that JANUS can generate diverse and safe responses, with lower toxicity compared to other models. They conduct various ablation studies and analyses to examine the behavior of JANUS, demonstrating that it exhibits system prompt generalization and benefits from incorporating multifaceted system messages. Furthermore, JANUS can be an efficient personalized reward model, aligning with diverse values and providing tailored rewards to meet individual user requests. The results indicate that JANUS not only retains its capability for generalization across system messages and diversity but also maintains harmlessness. The authors believe their work contributes to developing AI systems that respect the values of both the majority and individual user preferences. The code, dataset, benchmark, and models are available at https://github.com/kaistAI/Janus.
Reach us at info@study.space
Understanding Aligning to Thousands of Preferences via System Message Generalization