2402.04013v2 [cs.CV] 11 Sep 2024 | Hao Fang, Yixiang Qiu, Hongyao Yu, Wenbo Yu, Jiawei Kong, Baoli Chong, Bin Chen, Xuan Wang, Shu-Tao Xia, Ke Xu
This paper provides a comprehensive survey of model inversion (MI) attacks and defenses on deep neural networks (DNNs). MI attacks aim to reconstruct private training data from a released DNN, posing significant privacy risks. The paper first reviews early MI studies on traditional machine learning (ML) scenarios, then analyzes and compares numerous recent MI attacks and defenses on DNNs across various modalities and learning tasks. It classifies these methods into different categories and proposes a novel taxonomy. The paper also discusses promising research directions and potential solutions to open issues. To facilitate further study, an open-source model inversion toolbox is provided.
MI attacks can be categorized into white-box and black-box based on the attacker's knowledge of the target model. White-box attacks have full access to the model's weights and outputs, while black-box attacks only have access to predicted confidence probabilities or labels. MI attacks target various data modalities, including tabular, image, text, and graph data. For image data, attacks often involve optimizing the model to reconstruct high-resolution images. For text data, attacks focus on recovering sensitive information from language models. For graph data, attacks aim to reconstruct private information from graph structures.
The paper discusses different attack strategies, including generative models, data initialization, and attack processes. It also explores additional techniques such as pseudo label guidance, augmentation, and results selection to improve attack effectiveness. The paper highlights the challenges of MI attacks on deep neural networks, including the need for robust defenses and the importance of understanding the underlying mechanisms of these attacks. The paper concludes with a discussion of the social impact of MI attacks and future research directions.This paper provides a comprehensive survey of model inversion (MI) attacks and defenses on deep neural networks (DNNs). MI attacks aim to reconstruct private training data from a released DNN, posing significant privacy risks. The paper first reviews early MI studies on traditional machine learning (ML) scenarios, then analyzes and compares numerous recent MI attacks and defenses on DNNs across various modalities and learning tasks. It classifies these methods into different categories and proposes a novel taxonomy. The paper also discusses promising research directions and potential solutions to open issues. To facilitate further study, an open-source model inversion toolbox is provided.
MI attacks can be categorized into white-box and black-box based on the attacker's knowledge of the target model. White-box attacks have full access to the model's weights and outputs, while black-box attacks only have access to predicted confidence probabilities or labels. MI attacks target various data modalities, including tabular, image, text, and graph data. For image data, attacks often involve optimizing the model to reconstruct high-resolution images. For text data, attacks focus on recovering sensitive information from language models. For graph data, attacks aim to reconstruct private information from graph structures.
The paper discusses different attack strategies, including generative models, data initialization, and attack processes. It also explores additional techniques such as pseudo label guidance, augmentation, and results selection to improve attack effectiveness. The paper highlights the challenges of MI attacks on deep neural networks, including the need for robust defenses and the importance of understanding the underlying mechanisms of these attacks. The paper concludes with a discussion of the social impact of MI attacks and future research directions.