Digital Forgetting in Large Language Models: A Survey of Unlearning Methods

Digital Forgetting in Large Language Models: A Survey of Unlearning Methods

April 3, 2024 | Alberto Blanco-Justicia, Najeeb Jebreel, Benet Manzanares, David Sánchez, Josep Domingo-Ferrer, Guillem Collell, and Kuan Eeik Tan
This paper presents a survey of unlearning methods for large language models (LLMs), focusing on digital forgetting. LLMs have become the state of the art in natural language processing (NLP) and understanding (NLU) tasks. However, they raise concerns about privacy, copyright, model robustness, and alignment with human values. Digital forgetting aims to remove undesirable knowledge or behavior from LLMs. Effective digital forgetting mechanisms must balance the effectiveness of forgetting, the performance of the model on desirable tasks, and the timeliness and scalability of the forgetting procedure. The paper provides a background on LLMs, including their components, types, and training pipeline. It describes the motivations, types, and desired properties of digital forgetting. It introduces approaches to digital forgetting in LLMs, with unlearning methodologies being the state of the art. The paper provides a detailed taxonomy of unlearning methods for LLMs, surveys and compares current approaches. It details datasets, models, and metrics used for the evaluation of forgetting, retaining, and runtime. It discusses challenges in the area, including guarantees of forgetting, retention of model utility, generalization of unlearning, runtime and scalability, and evaluation. It also addresses when each method can be used and the scenario of black-box access. Finally, it concludes with some concluding remarks.This paper presents a survey of unlearning methods for large language models (LLMs), focusing on digital forgetting. LLMs have become the state of the art in natural language processing (NLP) and understanding (NLU) tasks. However, they raise concerns about privacy, copyright, model robustness, and alignment with human values. Digital forgetting aims to remove undesirable knowledge or behavior from LLMs. Effective digital forgetting mechanisms must balance the effectiveness of forgetting, the performance of the model on desirable tasks, and the timeliness and scalability of the forgetting procedure. The paper provides a background on LLMs, including their components, types, and training pipeline. It describes the motivations, types, and desired properties of digital forgetting. It introduces approaches to digital forgetting in LLMs, with unlearning methodologies being the state of the art. The paper provides a detailed taxonomy of unlearning methods for LLMs, surveys and compares current approaches. It details datasets, models, and metrics used for the evaluation of forgetting, retaining, and runtime. It discusses challenges in the area, including guarantees of forgetting, retention of model utility, generalization of unlearning, runtime and scalability, and evaluation. It also addresses when each method can be used and the scenario of black-box access. Finally, it concludes with some concluding remarks.
Reach us at info@study.space
[slides and audio] Digital forgetting in large language models%3A a survey of unlearning methods