Understanding Aligning to Thousands of Preferences via System Message Generalization

The paper "Aligning to Thousands of Preferences via System Message Generalization" addresses the challenge of aligning large language models (LLMs) with diverse user preferences without the need for retraining for each individual preference. The authors propose a new paradigm where users specify their values within the system message, guiding the LLM's generation behavior to better align with their intentions. To improve the LLM's ability to generalize to diverse, unseen system messages, they create the MULTIFACETED COLLECTION, a preference dataset with 192k combinations of values beyond generic helpfulness and harmlessness, spanning 65k user instructions. Using this dataset, they train a 7B LLM called JANUS and test it on 921 prompts from 5 benchmarks. JANUS achieves a tie+win rate of 75.2%, 72.4%, and 66.4% against Mistral 7B Instruct v0.2, GPT-3.5 Turbo, and GPT-4, respectively. Surprisingly, JANUS also outperforms LLaMA 3 8B Instruct by a margin of +4.0%, +0.1%, and +3.0% in response helpfulness benchmarks, demonstrating that training with a vast array of system messages enhances alignment with general public preferences. The paper includes detailed experimental results, analyses, and discussions on the effectiveness of JANUS in generating personalized responses, maintaining diversity, and ensuring safety.The paper "Aligning to Thousands of Preferences via System Message Generalization" addresses the challenge of aligning large language models (LLMs) with diverse user preferences without the need for retraining for each individual preference. The authors propose a new paradigm where users specify their values within the system message, guiding the LLM's generation behavior to better align with their intentions. To improve the LLM's ability to generalize to diverse, unseen system messages, they create the MULTIFACETED COLLECTION, a preference dataset with 192k combinations of values beyond generic helpfulness and harmlessness, spanning 65k user instructions. Using this dataset, they train a 7B LLM called JANUS and test it on 921 prompts from 5 benchmarks. JANUS achieves a tie+win rate of 75.2%, 72.4%, and 66.4% against Mistral 7B Instruct v0.2, GPT-3.5 Turbo, and GPT-4, respectively. Surprisingly, JANUS also outperforms LLaMA 3 8B Instruct by a margin of +4.0%, +0.1%, and +3.0% in response helpfulness benchmarks, demonstrating that training with a vast array of system messages enhances alignment with general public preferences. The paper includes detailed experimental results, analyses, and discussions on the effectiveness of JANUS in generating personalized responses, maintaining diversity, and ensuring safety.

Aligning to Thousands of Preferences via System Message Generalization

28 May 2024 | Seongyun Lee1*, Sue Hyun Park1*, Seungone Kim1,2 Minjoon Seo1

28 May 2024 | Seongyun Lee1, Sue Hyun Park1, Seungone Kim1,2 Minjoon Seo1