21 Feb 2024 | Shashwat Goel*, Ameya Prabhu*, Philip Torr, Ponurangam Kumaraguru, and Amartya Sanyal
Corrective Machine Unlearning addresses the challenge of removing the impact of manipulated data on trained models, especially when only a subset of the affected data is known. Traditional unlearning methods, such as retraining from scratch, often require most of the manipulated data to be identified for effective correction. However, methods like Selective Synaptic Dampening (SSD) show promise by achieving limited success with just a small portion of the manipulated data, demonstrating the tractability of this setting. The study evaluates various unlearning methods on two types of manipulations: poisoning and interclass confusion. Results show that SSD effectively removes the influence of poisoned data with only 10% of the manipulated samples, but leads to a significant drop in model accuracy. In contrast, methods like EU and CF perform poorly when only a small fraction of the manipulated data is known. The paper highlights the need for unlearning procedures tailored to removing the influence of manipulated data, emphasizing the importance of balancing model accuracy and utility. The findings suggest that while SSD is effective for certain types of manipulation, it is not universally applicable, indicating the need for further research into more robust unlearning methods. The study underscores the importance of developing better unlearning techniques to address data integrity challenges arising from web-scale training.Corrective Machine Unlearning addresses the challenge of removing the impact of manipulated data on trained models, especially when only a subset of the affected data is known. Traditional unlearning methods, such as retraining from scratch, often require most of the manipulated data to be identified for effective correction. However, methods like Selective Synaptic Dampening (SSD) show promise by achieving limited success with just a small portion of the manipulated data, demonstrating the tractability of this setting. The study evaluates various unlearning methods on two types of manipulations: poisoning and interclass confusion. Results show that SSD effectively removes the influence of poisoned data with only 10% of the manipulated samples, but leads to a significant drop in model accuracy. In contrast, methods like EU and CF perform poorly when only a small fraction of the manipulated data is known. The paper highlights the need for unlearning procedures tailored to removing the influence of manipulated data, emphasizing the importance of balancing model accuracy and utility. The findings suggest that while SSD is effective for certain types of manipulation, it is not universally applicable, indicating the need for further research into more robust unlearning methods. The study underscores the importance of developing better unlearning techniques to address data integrity challenges arising from web-scale training.