Understanding Digital forgetting in large language models%3A a survey of unlearning methods

The paper "Digital Forgetting in Large Language Models: A Survey of Unlearning Methods" by Alberto Blanco-Justicia et al. explores the challenges and methods for digital forgetting in large language models (LLMs). The authors discuss the motivations for digital forgetting, including privacy, copyright protection, model robustness, and alignment with human values. They categorize the types of digital forgetting requests, such as general forgetting, item removal, feature removal, class removal, and task removal. The paper outlines the requirements for effective digital forgetting, including guarantees, generalization, performance retention, and runtime scalability. The authors survey various approaches to digital forgetting in LLMs, focusing on unlearning methodologies. These methods are categorized into four primary types: global weight modification, local weight modification, architecture modification, and input/output modification. Global weight modification methods, such as data sharding, involve altering all model parameters, while local weight modification methods modify specific subsets of weights. Architecture modification methods introduce additional layers, and input/output modification methods focus on the input/output level. The paper provides a detailed taxonomy of unlearning methods and evaluates their effectiveness through datasets, models, metrics, and attacks. It discusses the challenges in digital forgetting, such as guarantees, retaining model utility, generalization, runtime, and scalability. The authors conclude with recommendations for future research and practical applications of digital forgetting in LLMs.The paper "Digital Forgetting in Large Language Models: A Survey of Unlearning Methods" by Alberto Blanco-Justicia et al. explores the challenges and methods for digital forgetting in large language models (LLMs). The authors discuss the motivations for digital forgetting, including privacy, copyright protection, model robustness, and alignment with human values. They categorize the types of digital forgetting requests, such as general forgetting, item removal, feature removal, class removal, and task removal. The paper outlines the requirements for effective digital forgetting, including guarantees, generalization, performance retention, and runtime scalability. The authors survey various approaches to digital forgetting in LLMs, focusing on unlearning methodologies. These methods are categorized into four primary types: global weight modification, local weight modification, architecture modification, and input/output modification. Global weight modification methods, such as data sharding, involve altering all model parameters, while local weight modification methods modify specific subsets of weights. Architecture modification methods introduce additional layers, and input/output modification methods focus on the input/output level. The paper provides a detailed taxonomy of unlearning methods and evaluates their effectiveness through datasets, models, metrics, and attacks. It discusses the challenges in digital forgetting, such as guarantees, retaining model utility, generalization, runtime, and scalability. The authors conclude with recommendations for future research and practical applications of digital forgetting in LLMs.

Digital Forgetting in Large Language Models: A Survey of Unlearning Methods

April 3, 2024 | Alberto Blanco-Justicia, Najeeb Jebreel, Benet Manzanares, David Sánchez, Josep Domingo-Ferrer, Guillem Collell, Kuan Eeik Tan