MetaAligner is a novel method for multi-objective preference alignment of language models (LLMs), designed to be policy-agnostic and generalizable. It enables plug-and-play alignment by decoupling parameter updates from the policy models and facilitates zero-shot alignment for unseen objectives through in-context learning. The method performs conditional weak-to-strong correction on weak responses to approach strong responses, allowing it to align with a wide range of objectives without requiring explicit reward values. MetaAligner is trained on dynamic multi-objective datasets, which allow for flexible adjustment of alignment objectives during training. This approach enables the model to adapt to new objectives by updating the objective descriptions in the prompts and leveraging in-context learning for new alignment strategies. Experimental results show that MetaAligner achieves significant and balanced improvements in multi-objective alignments on 10 state-of-the-art policy models, outperforming previous alignment methods with up to 15.71× less GPU training hours. The model also effectively aligns with unseen objectives, marking the first step towards generalizable multi-objective preference alignment. The project is open-sourced at: https://github.com/SteveKGYang/MetaAligner. The paper presents three main contributions: (1) the proposal of MetaAligner, the first policy-agnostic method for multi-objective preference alignment; (2) the use of MetaAligner for zero-shot preference alignment on unseen objectives; and (3) the evaluation of MetaAligner on three preference alignment datasets. The results show that MetaAligner can simultaneously perform effective alignment for 6 unseen objectives while maintaining performance on aligned objectives. The model also significantly improves performance on API-based models such as GPT-3.5 and Claude-3-Sonnet. The paper also discusses the limitations and future work of MetaAligner, including the computational burden during alignment inference and the need for further exploration into the scalability of the model. The paper concludes that MetaAligner is a promising method for generalizable multi-objective preference alignment, with potential applications in low-resource scenarios and for aligning with a wide range of objectives.MetaAligner is a novel method for multi-objective preference alignment of language models (LLMs), designed to be policy-agnostic and generalizable. It enables plug-and-play alignment by decoupling parameter updates from the policy models and facilitates zero-shot alignment for unseen objectives through in-context learning. The method performs conditional weak-to-strong correction on weak responses to approach strong responses, allowing it to align with a wide range of objectives without requiring explicit reward values. MetaAligner is trained on dynamic multi-objective datasets, which allow for flexible adjustment of alignment objectives during training. This approach enables the model to adapt to new objectives by updating the objective descriptions in the prompts and leveraging in-context learning for new alignment strategies. Experimental results show that MetaAligner achieves significant and balanced improvements in multi-objective alignments on 10 state-of-the-art policy models, outperforming previous alignment methods with up to 15.71× less GPU training hours. The model also effectively aligns with unseen objectives, marking the first step towards generalizable multi-objective preference alignment. The project is open-sourced at: https://github.com/SteveKGYang/MetaAligner. The paper presents three main contributions: (1) the proposal of MetaAligner, the first policy-agnostic method for multi-objective preference alignment; (2) the use of MetaAligner for zero-shot preference alignment on unseen objectives; and (3) the evaluation of MetaAligner on three preference alignment datasets. The results show that MetaAligner can simultaneously perform effective alignment for 6 unseen objectives while maintaining performance on aligned objectives. The model also significantly improves performance on API-based models such as GPT-3.5 and Claude-3-Sonnet. The paper also discusses the limitations and future work of MetaAligner, including the computational burden during alignment inference and the need for further exploration into the scalability of the model. The paper concludes that MetaAligner is a promising method for generalizable multi-objective preference alignment, with potential applications in low-resource scenarios and for aligning with a wide range of objectives.