Understanding ARGS%3A Alignment as Reward-Guided Search

ARGS (Alignment as Reward-Guided Search) is a novel framework designed to align large language models with human objectives by integrating alignment into the decoding process. Unlike traditional reinforcement learning from human feedback (RLHF), ARGS eliminates the need for expensive and unstable RL training. Instead, it uses a reward signal to adjust the model's probabilistic predictions, generating texts that are both semantically coherent and aligned with human preferences.ARGS demonstrates consistent improvements in average reward across various alignment tasks and model dimensions, outperforming baselines by 19.56% in GPT-4 evaluations. The framework is flexible, compatible with different models and tasks, and can be adapted to evolving datasets without extensive retraining. ARGS introduces a new perspective on alignment, emphasizing post-training adjustments, which can enhance the responsiveness and safety of AI systems. The code for ARGS is publicly available at <https://github.com/deeplearning-wisc/args>.ARGS (Alignment as Reward-Guided Search) is a novel framework designed to align large language models with human objectives by integrating alignment into the decoding process. Unlike traditional reinforcement learning from human feedback (RLHF), ARGS eliminates the need for expensive and unstable RL training. Instead, it uses a reward signal to adjust the model's probabilistic predictions, generating texts that are both semantically coherent and aligned with human preferences.ARGS demonstrates consistent improvements in average reward across various alignment tasks and model dimensions, outperforming baselines by 19.56% in GPT-4 evaluations. The framework is flexible, compatible with different models and tasks, and can be adapted to evolving datasets without extensive retraining. ARGS introduces a new perspective on alignment, emphasizing post-training adjustments, which can enhance the responsiveness and safety of AI systems. The code for ARGS is publicly available at <https://github.com/deeplearning-wisc/args>.

ARGS: ALIGNMENT AS REWARD-GUIDED SEARCH

23 Jan 2024 | Maxim Khanov, Jirayu Burapacheep, Yixuan Li