Tuning Language Models by Proxy

Tuning Language Models by Proxy

2024 | Alisa Liu, Xiaochuang Han, Yizhong Wang, Yulia Tsvetkov, Yejin Choi, Noah A. Smith
Proxy-tuning is a lightweight decoding-time algorithm that allows large language models (LLMs) to be adapted without accessing their internal parameters. It uses the predictions of a smaller, tuned model (expert) and its untuned counterpart (anti-expert) to adjust the output of a larger, untuned model. This method effectively shifts the predictions of the larger model toward the desired behavior, while retaining the benefits of large-scale pretraining. Proxy-tuning has been tested on various tasks, including instruction-following, domain adaptation, and task-specific finetuning. For example, when applied to LLaMA2-70B, proxy-tuning closed 88% of the performance gap with its directly tuned CHAT version across knowledge, reasoning, and safety benchmarks. It also improved performance on coding tasks and question-answering, with proxy-tuned models outperforming smaller tuned experts. Proxy-tuning has been applied to adapt models to code and to improve performance on math problems. Additionally, it was used to adapt GPT-3.5 for temporal adaptation, improving its knowledge about recent events. The method is effective in scenarios where model weights are private, as it only requires access to the model's predictions. Proxy-tuning offers a way to customize large LMs efficiently, without direct parameter tuning, and can be applied to a wide range of tasks and domains. It also allows for fine-grained control over the amount of guidance provided during decoding, enabling users to balance different aspects of generation. Overall, proxy-tuning demonstrates the potential of using smaller, tuned models to customize larger, potentially proprietary LMs through decoding-time guidance.Proxy-tuning is a lightweight decoding-time algorithm that allows large language models (LLMs) to be adapted without accessing their internal parameters. It uses the predictions of a smaller, tuned model (expert) and its untuned counterpart (anti-expert) to adjust the output of a larger, untuned model. This method effectively shifts the predictions of the larger model toward the desired behavior, while retaining the benefits of large-scale pretraining. Proxy-tuning has been tested on various tasks, including instruction-following, domain adaptation, and task-specific finetuning. For example, when applied to LLaMA2-70B, proxy-tuning closed 88% of the performance gap with its directly tuned CHAT version across knowledge, reasoning, and safety benchmarks. It also improved performance on coding tasks and question-answering, with proxy-tuned models outperforming smaller tuned experts. Proxy-tuning has been applied to adapt models to code and to improve performance on math problems. Additionally, it was used to adapt GPT-3.5 for temporal adaptation, improving its knowledge about recent events. The method is effective in scenarios where model weights are private, as it only requires access to the model's predictions. Proxy-tuning offers a way to customize large LMs efficiently, without direct parameter tuning, and can be applied to a wide range of tasks and domains. It also allows for fine-grained control over the amount of guidance provided during decoding, enabling users to balance different aspects of generation. Overall, proxy-tuning demonstrates the potential of using smaller, tuned models to customize larger, potentially proprietary LMs through decoding-time guidance.
Reach us at info@study.space
[slides] Tuning Language Models by Proxy | StudySpace