Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models

Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models

19 Nov 2024 | Zhanhui Zhou*, Zhixuan Liu*, Jie Liu, Zhichen Dong, Chao Yang, Yu Qiao
The paper introduces a novel method called *weak-to-strong search* for aligning large language models (LLMs) with human preferences. This method frames the alignment process as a test-time greedy search, maximizing the log-probability difference between small tuned and untuned models while sampling from the frozen large model. The key contributions are: 1. **Model Up-Skaling Strategy**: It offers a compute-efficient approach to aligning large models without directly fine-tuning them, by leveraging small models as steering forces. 2. **Weak-to-Strong Generalization**: It enhances strong models with weak test-time guidance, demonstrating the potential for weak-to-strong generalization. The method is evaluated on various tasks, including controlled sentiment generation, summarization, and instruction-following. Empirical results show that weak-to-strong search effectively improves the alignment of large models using only test-time guidance from small models. In particular, it outperforms other test-time approaches in controlled sentiment generation and summarization tasks, and consistently enhances the performance of large instruction-tuned models in the AlpacaEval 2.0 benchmark. The paper also discusses the limitations and future directions, including the potential for applying weak-to-strong search to tasks beyond human preference alignment, such as reasoning and coding.The paper introduces a novel method called *weak-to-strong search* for aligning large language models (LLMs) with human preferences. This method frames the alignment process as a test-time greedy search, maximizing the log-probability difference between small tuned and untuned models while sampling from the frozen large model. The key contributions are: 1. **Model Up-Skaling Strategy**: It offers a compute-efficient approach to aligning large models without directly fine-tuning them, by leveraging small models as steering forces. 2. **Weak-to-Strong Generalization**: It enhances strong models with weak test-time guidance, demonstrating the potential for weak-to-strong generalization. The method is evaluated on various tasks, including controlled sentiment generation, summarization, and instruction-following. Empirical results show that weak-to-strong search effectively improves the alignment of large models using only test-time guidance from small models. In particular, it outperforms other test-time approaches in controlled sentiment generation and summarization tasks, and consistently enhances the performance of large instruction-tuned models in the AlpacaEval 2.0 benchmark. The paper also discusses the limitations and future directions, including the potential for applying weak-to-strong search to tasks beyond human preference alignment, such as reasoning and coding.
Reach us at info@study.space