[slides and audio] Selective Forgetting%3A Advancing Machine Unlearning Techniques and Evaluation in Language Models

This paper explores the emerging field of Machine Unlearning (MU), focusing on the challenge of neural models unintentionally retaining personal or sensitive information. The authors introduce SEUL, a novel method that enables selective and fine-grained unlearning for language models, minimizing the negative impact on their capabilities, particularly in generation tasks. Unlike previous methods that use a fully reversed training objective, SEUL focuses on specific sequence spans rather than entire instances, preserving the model's performance. The paper also introduces two new evaluation metrics, S-EL (Sensitive Extraction Likelihood) and S-MA (Sensitive Memorization Accuracy), to assess the effectiveness of forgetting sensitive information. Additionally, it proposes efficient automatic online and offline methods for sensitive span annotation to support the unlearning framework. The contributions of the paper include a novel unlearning method, specialized evaluation metrics, and automatic annotation methods, all aimed at addressing the concerns of privacy leakage in language models.This paper explores the emerging field of Machine Unlearning (MU), focusing on the challenge of neural models unintentionally retaining personal or sensitive information. The authors introduce SEUL, a novel method that enables selective and fine-grained unlearning for language models, minimizing the negative impact on their capabilities, particularly in generation tasks. Unlike previous methods that use a fully reversed training objective, SEUL focuses on specific sequence spans rather than entire instances, preserving the model's performance. The paper also introduces two new evaluation metrics, S-EL (Sensitive Extraction Likelihood) and S-MA (Sensitive Memorization Accuracy), to assess the effectiveness of forgetting sensitive information. Additionally, it proposes efficient automatic online and offline methods for sensitive span annotation to support the unlearning framework. The contributions of the paper include a novel unlearning method, specialized evaluation metrics, and automatic annotation methods, all aimed at addressing the concerns of privacy leakage in language models.

Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models

2024-12-16 | Lingzhi Wang, Xingshan Zeng, Jinsong Guo, Kam-Fai Wong, Georg Gottlob