[slides] Large Language Model Unlearning via Embedding-Corrupted Prompts

This paper introduces ECO Prompts, a lightweight framework for unlearning knowledge from large language models (LLMs). The method addresses the challenges of knowledge entanglement and unlearning efficiency by enforcing an unlearned state during inference. Instead of relying on the LLM itself to unlearn, ECO Prompts use a prompt classifier to identify and safeguard prompts to forget. Corruptions are added to prompt embeddings via zeroth order optimization, and these corrupted prompts are used during inference to achieve the unlearning objective. The method is shown to produce outputs that closely approximate those of a model that has never been trained on the data intended for forgetting. Extensive experiments demonstrate the effectiveness of ECO Prompts in achieving unlearning with minimal side effects across various domains and model sizes, including 100 LLMs with up to 236B parameters. The method is scalable and efficient, making it a promising approach for responsible and safe AI deployment.This paper introduces ECO Prompts, a lightweight framework for unlearning knowledge from large language models (LLMs). The method addresses the challenges of knowledge entanglement and unlearning efficiency by enforcing an unlearned state during inference. Instead of relying on the LLM itself to unlearn, ECO Prompts use a prompt classifier to identify and safeguard prompts to forget. Corruptions are added to prompt embeddings via zeroth order optimization, and these corrupted prompts are used during inference to achieve the unlearning objective. The method is shown to produce outputs that closely approximate those of a model that has never been trained on the data intended for forgetting. Extensive experiments demonstrate the effectiveness of ECO Prompts in achieving unlearning with minimal side effects across various domains and model sizes, including 100 LLMs with up to 236B parameters. The method is scalable and efficient, making it a promising approach for responsible and safe AI deployment.

Large Language Model Unlearning via Embedding-Corrupted Prompts

12 Jun 2024 | Chris Yuhao Liu, Yaxuan Wang, Jeffrey Flanigan, Yang Liu