31 Mar 2017 | Reza Shokri, Marco Stronati, Congzheng Song, Vityal Shmatikov
We investigate how machine learning models leak information about individual data records used to train them, focusing on membership inference attacks. These attacks aim to determine whether a given data record was part of the model's training dataset. We evaluate our techniques on commercial machine learning platforms like Google and Amazon, using realistic datasets and tasks, including a sensitive hospital discharge dataset. Our results show that these models can be vulnerable to membership inference attacks, with high accuracy in identifying training records. We explore factors influencing leakage and evaluate mitigation strategies. Our attack model is trained using shadow models that mimic the target model's behavior, allowing us to distinguish between training and non-training data. We develop methods to generate training data for these shadow models, including using black-box access, population statistics, and noisy training data. Our experiments demonstrate that membership inference attacks can successfully identify training records, with accuracy varying based on the classification task and model overfitting. We also show that these attacks are robust even when the attacker's assumptions about the target model's data distribution are inaccurate. Our results highlight the privacy risks associated with machine learning models trained on sensitive data and accessed publicly.We investigate how machine learning models leak information about individual data records used to train them, focusing on membership inference attacks. These attacks aim to determine whether a given data record was part of the model's training dataset. We evaluate our techniques on commercial machine learning platforms like Google and Amazon, using realistic datasets and tasks, including a sensitive hospital discharge dataset. Our results show that these models can be vulnerable to membership inference attacks, with high accuracy in identifying training records. We explore factors influencing leakage and evaluate mitigation strategies. Our attack model is trained using shadow models that mimic the target model's behavior, allowing us to distinguish between training and non-training data. We develop methods to generate training data for these shadow models, including using black-box access, population statistics, and noisy training data. Our experiments demonstrate that membership inference attacks can successfully identify training records, with accuracy varying based on the classification task and model overfitting. We also show that these attacks are robust even when the attacker's assumptions about the target model's data distribution are inaccurate. Our results highlight the privacy risks associated with machine learning models trained on sensitive data and accessed publicly.