30 May 2024 | Souradip Chakraborty*1, Soumya Suvra Ghosal*1, Ming Yin2, Dinesh Manocha1, Mengdi Wang2, Amrit Singh Bedi†3, and Furong Huang†1
The paper "Transfer Q*: Principled Decoding for LLM Alignment" addresses the challenge of aligning large language models (LLMs) with specific target rewards without updating their parameters, which is computationally intensive and resource-demanding. The authors propose Transfer Q*, a novel decoding method that estimates the optimal value function for a target reward using a baseline model aligned with a different reward. This approach reduces the sub-optimality gap observed in previous methods and improves performance in terms of coherence, diversity, and quality across various synthetic and real datasets. The paper provides theoretical analyses to characterize the optimality of Transfer Q* and offers practical guidelines for hyperparameter tuning. Experimental results demonstrate that Transfer Q* outperforms existing state-of-the-art decoding strategies, including direct and indirect transfer decoding methods, in both synthetic and real-world tasks.The paper "Transfer Q*: Principled Decoding for LLM Alignment" addresses the challenge of aligning large language models (LLMs) with specific target rewards without updating their parameters, which is computationally intensive and resource-demanding. The authors propose Transfer Q*, a novel decoding method that estimates the optimal value function for a target reward using a baseline model aligned with a different reward. This approach reduces the sub-optimality gap observed in previous methods and improves performance in terms of coherence, diversity, and quality across various synthetic and real datasets. The paper provides theoretical analyses to characterize the optimality of Transfer Q* and offers practical guidelines for hyperparameter tuning. Experimental results demonstrate that Transfer Q* outperforms existing state-of-the-art decoding strategies, including direct and indirect transfer decoding methods, in both synthetic and real-world tasks.