Learn What You Want to Unlearn: Unlearning Inversion Attacks against Machine Unlearning

Learn What You Want to Unlearn: Unlearning Inversion Attacks against Machine Unlearning

4 Apr 2024 | Hongsheng Hu*, Shuo Wang†, Tian Dong† and Minhui Xue*
This paper introduces unlearning inversion attacks, which reveal the feature and label information of unlearned data by accessing the original and unlearned models. The attacks exploit the differences between the original and unlearned models to infer private information, highlighting a previously underexplored privacy vulnerability in machine unlearning. The study evaluates the effectiveness of these attacks on benchmark datasets across various model architectures and unlearning methods, demonstrating that they can recover sensitive information. Three defense strategies are proposed, but they come at the cost of reduced model utility. The paper also provides a detailed threat model for MLaaS scenarios, where the server and users can potentially recover unlearned data features and labels. The results show that unlearning inversion attacks can successfully recover unlearned data features and infer their labels, even when the adversary has limited knowledge. The study underscores the need for careful design of unlearning mechanisms to prevent information leakage.This paper introduces unlearning inversion attacks, which reveal the feature and label information of unlearned data by accessing the original and unlearned models. The attacks exploit the differences between the original and unlearned models to infer private information, highlighting a previously underexplored privacy vulnerability in machine unlearning. The study evaluates the effectiveness of these attacks on benchmark datasets across various model architectures and unlearning methods, demonstrating that they can recover sensitive information. Three defense strategies are proposed, but they come at the cost of reduced model utility. The paper also provides a detailed threat model for MLaaS scenarios, where the server and users can potentially recover unlearned data features and labels. The results show that unlearning inversion attacks can successfully recover unlearned data features and infer their labels, even when the adversary has limited knowledge. The study underscores the need for careful design of unlearning mechanisms to prevent information leakage.
Reach us at info@study.space