24 May 2024 | Tianlin Liu, Shangmin Guo, Leonardo Bianco, Daniele Calandriello, Quentin Berthet, Felipe Llinares, Jessica Hoffmann, Lucas Dixon, Michal Valko, Mathieu Blondel
The paper introduces Decoding-time Realignment (DeRa), a method to adjust the regularization strength during decoding for language model alignment. Traditional methods for finding the optimal regularization level require retraining multiple models with varying regularization strengths, which is resource-intensive, especially for large models. DeRa allows users to explore and evaluate different regularization strengths in aligned models without retraining, providing control over the degree of alignment. The method is based on the variational perspective of the KL-regularized alignment objective, showing that aligned models with varying KL regularization strengths are geometric mixtures of a reference model and an aligned model. DeRa uses this knowledge to approximate these mixtures autoregressively during decoding. The approach is simple to implement and offers a clear interpretation. Experiments demonstrate that DeRa effectively identifies optimal regularization strengths, streamlines hyperparameter tuning, and reduces computational costs. DeRa can be applied to various alignment methods and tasks, including summarization, hallucination mitigation, and dialogue. The method is particularly useful for controlling alignment levels and improving performance in downstream tasks.The paper introduces Decoding-time Realignment (DeRa), a method to adjust the regularization strength during decoding for language model alignment. Traditional methods for finding the optimal regularization level require retraining multiple models with varying regularization strengths, which is resource-intensive, especially for large models. DeRa allows users to explore and evaluate different regularization strengths in aligned models without retraining, providing control over the degree of alignment. The method is based on the variational perspective of the KL-regularized alignment objective, showing that aligned models with varying KL regularization strengths are geometric mixtures of a reference model and an aligned model. DeRa uses this knowledge to approximate these mixtures autoregressively during decoding. The approach is simple to implement and offers a clear interpretation. Experiments demonstrate that DeRa effectively identifies optimal regularization strengths, streamlines hyperparameter tuning, and reduces computational costs. DeRa can be applied to various alignment methods and tasks, including summarization, hallucination mitigation, and dialogue. The method is particularly useful for controlling alignment levels and improving performance in downstream tasks.