[slides] Protecting Your LLMs with Information Bottleneck

The paper introduces the Information Bottleneck Protector (IBProtector), a defense mechanism designed to protect large language models (LLMs) from jailbreaking attacks. These attacks aim to bypass the safety alignment of LLMs by manipulating prompts to generate harmful content. The IBProtector is based on the information bottleneck principle, which aims to compress prompts while retaining essential information for the LLM's response. The method involves a lightweight, trainable extractor that selectively compresses and perturbs prompts, ensuring that only relevant information is passed to the LLM. The paper evaluates IBProtector on various datasets and attack methods, demonstrating its effectiveness in mitigating jailbreaking attempts without significantly affecting response quality or inference speed. The defense is shown to be robust across different attack strategies and target LLMs, highlighting its potential as a transferable defense mechanism. The paper also discusses the limitations and impact of the approach, emphasizing the importance of responsible AI development and the need for robust countermeasures against malicious exploitation of LLMs.The paper introduces the Information Bottleneck Protector (IBProtector), a defense mechanism designed to protect large language models (LLMs) from jailbreaking attacks. These attacks aim to bypass the safety alignment of LLMs by manipulating prompts to generate harmful content. The IBProtector is based on the information bottleneck principle, which aims to compress prompts while retaining essential information for the LLM's response. The method involves a lightweight, trainable extractor that selectively compresses and perturbs prompts, ensuring that only relevant information is passed to the LLM. The paper evaluates IBProtector on various datasets and attack methods, demonstrating its effectiveness in mitigating jailbreaking attempts without significantly affecting response quality or inference speed. The defense is shown to be robust across different attack strategies and target LLMs, highlighting its potential as a transferable defense mechanism. The paper also discusses the limitations and impact of the approach, emphasizing the importance of responsible AI development and the need for robust countermeasures against malicious exploitation of LLMs.

Protecting Your LLMs with Information Bottleneck

10 Oct 2024 | Zichuan Liu, Zefan Wang, Linjie Xu, Jinyu Wang, Lei Song, Tianchun Wang, Chunlin Chen, Wei Cheng, Jiang Bian