Purifying Large Language Models by Ensembling a Small Language Model

Purifying Large Language Models by Ensembling a Small Language Model

19 Feb 2024 | Tianlin Li*, Qian Liu*, Tianyu Pang*, Chao Du*, Qing Guo*, Yang Liu*, and Min Lin*
The paper addresses the challenges posed by uncurated data in large language models (LLMs), which can lead to issues such as copyright infringement, data poisoning, and privacy violations. To mitigate these problems, the authors propose a method that enforces LLMs to work in conjunction with benign small language models (SLMs). This ensemble approach leverages the CP-Δ algorithm, which ensures that the ensemble model diverges from the uncurated data by a specified amount, effectively purifying the LLM's output while preserving its standard performance. The study includes theoretical guarantees and extensive experiments on nine LLMs, including popular models like CodeLlama, Llama2, and Pythia. The experiments demonstrate that the ensemble strategy can significantly reduce negative effects such as copyright infringement, data poisoning, and privacy leakage while maintaining or improving the LLM's performance. The authors also explore the trade-offs between model purification and standard performance, showing that adjusting ensemble weights can balance these factors. The paper concludes by discussing the advantages of the ensemble approach, its limitations, and potential risks, emphasizing its potential to enhance the real-world application of LLMs and drive further research in this area.The paper addresses the challenges posed by uncurated data in large language models (LLMs), which can lead to issues such as copyright infringement, data poisoning, and privacy violations. To mitigate these problems, the authors propose a method that enforces LLMs to work in conjunction with benign small language models (SLMs). This ensemble approach leverages the CP-Δ algorithm, which ensures that the ensemble model diverges from the uncurated data by a specified amount, effectively purifying the LLM's output while preserving its standard performance. The study includes theoretical guarantees and extensive experiments on nine LLMs, including popular models like CodeLlama, Llama2, and Pythia. The experiments demonstrate that the ensemble strategy can significantly reduce negative effects such as copyright infringement, data poisoning, and privacy leakage while maintaining or improving the LLM's performance. The authors also explore the trade-offs between model purification and standard performance, showing that adjusting ensemble weights can balance these factors. The paper concludes by discussing the advantages of the ensemble approach, its limitations, and potential risks, emphasizing its potential to enhance the real-world application of LLMs and drive further research in this area.
Reach us at info@study.space