Protecting Your LLMs with Information Bottleneck

Protecting Your LLMs with Information Bottleneck

10 Oct 2024 | Zichuan Liu, Zefan Wang, Linjie Xu, Jinyu Wang, Lei Song, Tianchun Wang, Chunlin Chen, Wei Cheng, Jiang Bian
The paper introduces IBProtector, a novel defense mechanism for large language models (LLMs) against jailbreaking attacks. Based on the information bottleneck principle, IBProtector selectively compresses and perturbs prompts to preserve only essential information needed for the LLM to respond appropriately. It uses a lightweight, trainable extractor to compress prompts while minimizing mutual information between the original and compressed prompts, ensuring the compressed prompt retains sufficient information for the desired response. The method is designed to be effective against various attack methods and target LLMs without significantly affecting response quality or inference speed. Empirical evaluations show that IBProtector outperforms existing defense methods in mitigating jailbreak attempts. The approach is transferable across different LLMs and attack strategies, making it a promising solution for enhancing LLM security. The paper also discusses the limitations of the method, including the challenge of handling high-dimensional data and the need for further research on the impact of extracted information on LLM responses. Overall, IBProtector provides a robust defense mechanism that enhances the safety and reliability of LLMs.The paper introduces IBProtector, a novel defense mechanism for large language models (LLMs) against jailbreaking attacks. Based on the information bottleneck principle, IBProtector selectively compresses and perturbs prompts to preserve only essential information needed for the LLM to respond appropriately. It uses a lightweight, trainable extractor to compress prompts while minimizing mutual information between the original and compressed prompts, ensuring the compressed prompt retains sufficient information for the desired response. The method is designed to be effective against various attack methods and target LLMs without significantly affecting response quality or inference speed. Empirical evaluations show that IBProtector outperforms existing defense methods in mitigating jailbreak attempts. The approach is transferable across different LLMs and attack strategies, making it a promising solution for enhancing LLM security. The paper also discusses the limitations of the method, including the challenge of handling high-dimensional data and the need for further research on the impact of extracted information on LLM responses. Overall, IBProtector provides a robust defense mechanism that enhances the safety and reliability of LLMs.
Reach us at info@futurestudyspace.com