This paper introduces SEUL, a novel selective unlearning method for language models that enables fine-grained forgetting of sensitive information while preserving the model's generation capabilities. Unlike previous approaches that fully reverse the training objective, SEUL focuses on specific sequence spans rather than entire instances, reducing the negative impact on model performance. The paper also proposes two new evaluation metrics, Sensitive Extraction Likelihood (S-EL) and Sensitive Memorization Accuracy (S-MA), to assess the effectiveness of unlearning sensitive information. Additionally, the authors introduce efficient automatic online and offline methods for sensitive span annotation to support the unlearning framework and evaluation process.
The paper addresses the growing concern about neural models unintentionally remembering personal or sensitive information, especially with the rise of large language models. The "right to be forgotten" has been legislated in many countries, requiring companies to erase personal data upon user request. However, removing data from backend databases is straightforward, while neural models face challenges due to the unclear relationship between model weights and data.
SEUL is compared with existing methods like KUL, which fully reverses the training objective. SEUL demonstrates superior performance in unlearning sensitive information, maintaining higher accuracy and F1 scores on dialogue datasets. The method is also more robust against adversarial attacks, as it selectively unlearns specific spans rather than negating the entire loss function.
The paper evaluates SEUL on various datasets, including classification and dialogue tasks, showing that it effectively reduces the risk of sensitive information leakage without significantly impacting the model's general performance. The authors also analyze the stability of SEUL under different numbers of forgetting instances and demonstrate that smaller language models experience more performance drops when forgetting a large number of instances.
Overall, SEUL provides a more efficient and effective approach to unlearning sensitive information in language models, with a focus on preserving model capabilities and ensuring privacy. The proposed evaluation metrics and annotation methods contribute to the advancement of machine unlearning techniques in the context of large language models.This paper introduces SEUL, a novel selective unlearning method for language models that enables fine-grained forgetting of sensitive information while preserving the model's generation capabilities. Unlike previous approaches that fully reverse the training objective, SEUL focuses on specific sequence spans rather than entire instances, reducing the negative impact on model performance. The paper also proposes two new evaluation metrics, Sensitive Extraction Likelihood (S-EL) and Sensitive Memorization Accuracy (S-MA), to assess the effectiveness of unlearning sensitive information. Additionally, the authors introduce efficient automatic online and offline methods for sensitive span annotation to support the unlearning framework and evaluation process.
The paper addresses the growing concern about neural models unintentionally remembering personal or sensitive information, especially with the rise of large language models. The "right to be forgotten" has been legislated in many countries, requiring companies to erase personal data upon user request. However, removing data from backend databases is straightforward, while neural models face challenges due to the unclear relationship between model weights and data.
SEUL is compared with existing methods like KUL, which fully reverses the training objective. SEUL demonstrates superior performance in unlearning sensitive information, maintaining higher accuracy and F1 scores on dialogue datasets. The method is also more robust against adversarial attacks, as it selectively unlearns specific spans rather than negating the entire loss function.
The paper evaluates SEUL on various datasets, including classification and dialogue tasks, showing that it effectively reduces the risk of sensitive information leakage without significantly impacting the model's general performance. The authors also analyze the stability of SEUL under different numbers of forgetting instances and demonstrate that smaller language models experience more performance drops when forgetting a large number of instances.
Overall, SEUL provides a more efficient and effective approach to unlearning sensitive information in language models, with a focus on preserving model capabilities and ensuring privacy. The proposed evaluation metrics and annotation methods contribute to the advancement of machine unlearning techniques in the context of large language models.