Purifying Large Language Models by Ensembling a Small Language Model

Purifying Large Language Models by Ensembling a Small Language Model

19 Feb 2024 | Tianlin Li, Qian Liu, Tianyu Pang, Chao Du, Qing Guo, Yang Liu, and Min Lin
This paper proposes a method to purify large language models (LLMs) by ensembling them with small, benign language models (SLMs). The approach leverages the CP-Δ algorithm, which is designed for provable copyright protection, to mitigate negative effects from uncurated data, such as copyright infringement, data poisoning, and privacy violations. The method involves combining an untrusted LLM with a benign SLM to reduce the impact of uncurated data while preserving the LLM's performance. The study demonstrates that ensembling LLMs with SLMs can effectively reduce negative effects without significantly degrading standard performance. Experiments on nine LLMs, including widely-used code models like StarCoder and CodeLlama, as well as language models like Llama2 and Pythia, show that the ensemble strategy can significantly reduce copyright infringement, data poisoning, and PII leakage. The results indicate that the ensemble strategy can achieve high performance while minimizing the negative effects of uncurated data. The ensemble algorithm uses KL divergence as the distribution metric and adjusts the weights of the LLM and SLM to balance performance and purifying effects. The study also shows that the ensemble strategy can be applied to various models and tasks, making it a versatile solution for mitigating negative effects in LLMs. The method is efficient, easy to implement, and can be integrated with other model enhancement techniques. The results suggest that the ensemble strategy is a promising approach for purifying LLMs in real-world applications.This paper proposes a method to purify large language models (LLMs) by ensembling them with small, benign language models (SLMs). The approach leverages the CP-Δ algorithm, which is designed for provable copyright protection, to mitigate negative effects from uncurated data, such as copyright infringement, data poisoning, and privacy violations. The method involves combining an untrusted LLM with a benign SLM to reduce the impact of uncurated data while preserving the LLM's performance. The study demonstrates that ensembling LLMs with SLMs can effectively reduce negative effects without significantly degrading standard performance. Experiments on nine LLMs, including widely-used code models like StarCoder and CodeLlama, as well as language models like Llama2 and Pythia, show that the ensemble strategy can significantly reduce copyright infringement, data poisoning, and PII leakage. The results indicate that the ensemble strategy can achieve high performance while minimizing the negative effects of uncurated data. The ensemble algorithm uses KL divergence as the distribution metric and adjusts the weights of the LLM and SLM to balance performance and purifying effects. The study also shows that the ensemble strategy can be applied to various models and tasks, making it a versatile solution for mitigating negative effects in LLMs. The method is efficient, easy to implement, and can be integrated with other model enhancement techniques. The results suggest that the ensemble strategy is a promising approach for purifying LLMs in real-world applications.
Reach us at info@study.space
[slides] Purifying Large Language Models by Ensembling a Small Language Model | StudySpace