[slides] What makes unlearning hard and what to do about it

The paper "What makes unlearning hard and what to do about it" by Kairan Zhao explores the challenges and improvements in machine unlearning, a field that aims to remove the influence of specific training data from a model. The authors identify two key factors that affect the difficulty of unlearning: the entanglement between the forget set (data to be removed) and the retain set (data to be retained), and the memorization level of the forget set examples. They find that unlearning becomes harder when these sets are more entangled and when the forget set examples are more memorized. The paper also reveals previously unknown behaviors of state-of-the-art unlearning algorithms, such as the poor performance of relabelling-based methods on highly entangled and highly memorized forget sets. To address these challenges, the authors propose the Refined-Unlearning Meta-algorithm (RUM), which consists of two main steps: refinement and meta-unlearning. The refinement step divides the forget set into homogeneous subsets based on the identified factors, while the meta-unlearning step applies different unlearning algorithms to each subset in sequence. The authors demonstrate that RUM significantly improves the performance of various unlearning algorithms, including Fine-tune, SalUn, and NegGrad+. They also introduce a confidence-based memorization metric, C-proxy, which is more computationally efficient than the original memorization score and achieves similar performance gains. The paper concludes by discussing the broader impact of unlearning, emphasizing its potential to enhance user privacy and data safety, and outlines future directions for research, including the exploration of different refinement strategies and the investigation of privacy implications.The paper "What makes unlearning hard and what to do about it" by Kairan Zhao explores the challenges and improvements in machine unlearning, a field that aims to remove the influence of specific training data from a model. The authors identify two key factors that affect the difficulty of unlearning: the entanglement between the forget set (data to be removed) and the retain set (data to be retained), and the memorization level of the forget set examples. They find that unlearning becomes harder when these sets are more entangled and when the forget set examples are more memorized. The paper also reveals previously unknown behaviors of state-of-the-art unlearning algorithms, such as the poor performance of relabelling-based methods on highly entangled and highly memorized forget sets. To address these challenges, the authors propose the Refined-Unlearning Meta-algorithm (RUM), which consists of two main steps: refinement and meta-unlearning. The refinement step divides the forget set into homogeneous subsets based on the identified factors, while the meta-unlearning step applies different unlearning algorithms to each subset in sequence. The authors demonstrate that RUM significantly improves the performance of various unlearning algorithms, including Fine-tune, SalUn, and NegGrad+. They also introduce a confidence-based memorization metric, C-proxy, which is more computationally efficient than the original memorization score and achieves similar performance gains. The paper concludes by discussing the broader impact of unlearning, emphasizing its potential to enhance user privacy and data safety, and outlines future directions for research, including the exploration of different refinement strategies and the investigation of privacy implications.

What makes unlearning hard and what to do about it

30 Oct 2024 | Kairan Zhao*, Meghdad Kurmanji, George-Octavian Barbulescu, Eleni Triantafillou†, Peter Triantafillou†