TOFU: A Task of Fictitious Unlearning for LLMs

TOFU: A Task of Fictitious Unlearning for LLMs

11 Jan 2024 | Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary C. Lipton, J. Zico Kolter
TOFU: A Task of Fictitious Unlearning for LLMs Pratyush Maini, Zachary C. Lipton, Zhili Feng, Avi Schwarzschild, J. Zico Kolter Abstract: Large language models (LLMs) trained on massive web data can memorize and reproduce sensitive information, raising legal and ethical concerns. Unlearning, or tuning models to forget training data, helps protect private data. However, it is unclear how effective current unlearning methods are. TOFU is a benchmark designed to evaluate unlearning. It includes a dataset of 200 synthetic author profiles, with a subset called the forget set. Metrics are provided to assess unlearning efficacy. Baseline results show current methods are weak, motivating further research. Introduction: LLMs trained on large data sets face privacy and security issues. Unlearning is a promising approach to remove sensitive data, but evaluation is challenging. TOFU aims to address this by providing a well-defined unlearning task with a synthetic dataset. It includes three severity levels for forgetting 2, 10, or 20 authors. The task requires unlearning with O(number of forget samples) compute. The dataset is released on Hugging Face. Evaluation Metrics: TOFU introduces new evaluation schemes for unlearning, considering both model utility and forget quality. Model utility is measured using performance metrics and new datasets. Forget quality is assessed by comparing the probability of generating true answers to false answers on the forget set. A statistical test is used to compare unlearned models to retain models. Baseline Methods: Four baseline unlearning methods are evaluated on all three severity levels. These include gradient ascent, gradient difference, KL minimization, and preference optimization. Results show that current methods are weak, and unlearning is challenging. Models often fail to achieve meaningful forget quality, indicating the need for better unlearning techniques. Discussion: TOFU provides a well-defined unlearning task but has limitations. It focuses on entity-level unlearning, not instance-level. It also lacks consideration of alignment to human values. The benchmark targets measuring unlearning efficacy, but finding a suitable retain set is also a challenge. TOFU could be updated to include constraints on using the original retain set. Conclusion: TOFU is a benchmark for evaluating unlearning, highlighting the challenges and limitations of current methods. It motivates further research into better unlearning algorithms. The benchmark focuses on entity-level unlearning and provides a framework for evaluating unlearning efficacy. However, it has limitations in terms of alignment to human values and the scope of unlearning methods.TOFU: A Task of Fictitious Unlearning for LLMs Pratyush Maini, Zachary C. Lipton, Zhili Feng, Avi Schwarzschild, J. Zico Kolter Abstract: Large language models (LLMs) trained on massive web data can memorize and reproduce sensitive information, raising legal and ethical concerns. Unlearning, or tuning models to forget training data, helps protect private data. However, it is unclear how effective current unlearning methods are. TOFU is a benchmark designed to evaluate unlearning. It includes a dataset of 200 synthetic author profiles, with a subset called the forget set. Metrics are provided to assess unlearning efficacy. Baseline results show current methods are weak, motivating further research. Introduction: LLMs trained on large data sets face privacy and security issues. Unlearning is a promising approach to remove sensitive data, but evaluation is challenging. TOFU aims to address this by providing a well-defined unlearning task with a synthetic dataset. It includes three severity levels for forgetting 2, 10, or 20 authors. The task requires unlearning with O(number of forget samples) compute. The dataset is released on Hugging Face. Evaluation Metrics: TOFU introduces new evaluation schemes for unlearning, considering both model utility and forget quality. Model utility is measured using performance metrics and new datasets. Forget quality is assessed by comparing the probability of generating true answers to false answers on the forget set. A statistical test is used to compare unlearned models to retain models. Baseline Methods: Four baseline unlearning methods are evaluated on all three severity levels. These include gradient ascent, gradient difference, KL minimization, and preference optimization. Results show that current methods are weak, and unlearning is challenging. Models often fail to achieve meaningful forget quality, indicating the need for better unlearning techniques. Discussion: TOFU provides a well-defined unlearning task but has limitations. It focuses on entity-level unlearning, not instance-level. It also lacks consideration of alignment to human values. The benchmark targets measuring unlearning efficacy, but finding a suitable retain set is also a challenge. TOFU could be updated to include constraints on using the original retain set. Conclusion: TOFU is a benchmark for evaluating unlearning, highlighting the challenges and limitations of current methods. It motivates further research into better unlearning algorithms. The benchmark focuses on entity-level unlearning and provides a framework for evaluating unlearning efficacy. However, it has limitations in terms of alignment to human values and the scope of unlearning methods.
Reach us at info@study.space
[slides and audio] TOFU%3A A Task of Fictitious Unlearning for LLMs