12 Jun 2024 | Chris Yuhao Liu*, Yaxuan Wang, Jeffrey Flanigan†, Yang Liu†
The paper introduces Embedding-Corrupted (ECO) Prompts, a lightweight framework for unlearning knowledge from large language models (LLMs). ECO addresses the challenges of knowledge entanglement and unlearning efficiency by enforcing an unlearned state during inference. Instead of relying on the LLM itself, ECO uses a prompt classifier to identify and safeguard prompts that should be forgotten. Corruptions are learned offline via zeroth-order optimization and applied to the prompt embeddings during inference. The method demonstrates superior performance in unlearning tasks, achieving nearly zero side effects and no additional cost when scaling to larger models. The authors also highlight the scalability of ECO to 100 LLMs with parameters ranging from 0.5B to 236B. The paper includes extensive experiments on entity unlearning, hazardous knowledge unlearning, and copyrighted content unlearning, showing that ECO effectively retains model utility while forgetting the specified knowledge.The paper introduces Embedding-Corrupted (ECO) Prompts, a lightweight framework for unlearning knowledge from large language models (LLMs). ECO addresses the challenges of knowledge entanglement and unlearning efficiency by enforcing an unlearned state during inference. Instead of relying on the LLM itself, ECO uses a prompt classifier to identify and safeguard prompts that should be forgotten. Corruptions are learned offline via zeroth-order optimization and applied to the prompt embeddings during inference. The method demonstrates superior performance in unlearning tasks, achieving nearly zero side effects and no additional cost when scaling to larger models. The authors also highlight the scalability of ECO to 100 LLMs with parameters ranging from 0.5B to 236B. The paper includes extensive experiments on entity unlearning, hazardous knowledge unlearning, and copyrighted content unlearning, showing that ECO effectively retains model utility while forgetting the specified knowledge.