Understanding Decoding-Time Language Model Alignment with Multiple Objectives

The paper introduces Multi-Objective Decoding (MOD), a decoding-time algorithm for aligning language models (LMs) with multiple objectives. MOD combines the predictions of multiple base models, each trained for a specific objective, to generate output tokens. The key contribution is a closed-form solution derived using Legendre transform, which allows for efficient decoding and precise control over different objectives. The method is theoretically grounded, showing why existing approaches can be sub-optimal and providing optimality guarantees for MOD. Empirical results demonstrate MOD's effectiveness, achieving significant improvements over parameter-merging baselines and other multi-objective alignment methods. MOD is versatile, applicable to various tasks and models, and can handle both supervised fine-tuned and pre-trained models. The paper also includes a theoretical analysis, proving the necessity of strong barrier functions and discussing sub-optimality error propagation. Overall, MOD offers a training-free, efficient, and versatile solution for multi-objective LM alignment.The paper introduces Multi-Objective Decoding (MOD), a decoding-time algorithm for aligning language models (LMs) with multiple objectives. MOD combines the predictions of multiple base models, each trained for a specific objective, to generate output tokens. The key contribution is a closed-form solution derived using Legendre transform, which allows for efficient decoding and precise control over different objectives. The method is theoretically grounded, showing why existing approaches can be sub-optimal and providing optimality guarantees for MOD. Empirical results demonstrate MOD's effectiveness, achieving significant improvements over parameter-merging baselines and other multi-objective alignment methods. MOD is versatile, applicable to various tasks and models, and can handle both supervised fine-tuned and pre-trained models. The paper also includes a theoretical analysis, proving the necessity of strong barrier functions and discussing sub-optimality error propagation. Overall, MOD offers a training-free, efficient, and versatile solution for multi-objective LM alignment.

Decoding-Time Language Model Alignment with Multiple Objectives

29 Jun 2024 | Ruizhe Shi, Yifang Chen, Yushi Hu, Alisa Liu, Hannaneh Hajishirzi, Noah A. Smith, Simon S. Du