23 Aug 2024 | Alisa Liu, Xiaochuang Han, Yizhong Wang, Yulia Tsvetkov, Yejin Choi, Noah A. Smith
The paper introduces a lightweight decoding-time algorithm called proxy-tuning, which operates on top of black-box large language models (LLMs) to achieve the same results as direct tuning without accessing the model's internal parameters. Proxy-tuning tunes a smaller LM and then applies the difference between the predictions of the small tuned and untuned LMs to shift the original predictions of the larger untuned model. This method retains the benefits of larger-scale pretraining while being more resource-efficient and applicable to proprietary models. Experiments show that proxy-tuning can close 88% of the gap between a large LLM and its fully tuned version on knowledge, reasoning, and safety benchmarks. The method is also effective for domain adaptation on code and task-specific finetuning on question-answering and math problems. Additionally, proxy-tuning is applied to a truly black-box LM, GPT-3.5, for temporal adaptation, improving its accuracy on questions about recent events. Overall, proxy-tuning demonstrates the promise of using small tuned LMs to efficiently customize large, potentially proprietary LMs through decoding-time guidance.The paper introduces a lightweight decoding-time algorithm called proxy-tuning, which operates on top of black-box large language models (LLMs) to achieve the same results as direct tuning without accessing the model's internal parameters. Proxy-tuning tunes a smaller LM and then applies the difference between the predictions of the small tuned and untuned LMs to shift the original predictions of the larger untuned model. This method retains the benefits of larger-scale pretraining while being more resource-efficient and applicable to proprietary models. Experiments show that proxy-tuning can close 88% of the gap between a large LLM and its fully tuned version on knowledge, reasoning, and safety benchmarks. The method is also effective for domain adaptation on code and task-specific finetuning on question-answering and math problems. Additionally, proxy-tuning is applied to a truly black-box LM, GPT-3.5, for temporal adaptation, improving its accuracy on questions about recent events. Overall, proxy-tuning demonstrates the promise of using small tuned LMs to efficiently customize large, potentially proprietary LMs through decoding-time guidance.