Exploiting Unintended Feature Leakage in Collaborative Learning

Exploiting Unintended Feature Leakage in Collaborative Learning

1 Nov 2018 | Luca Melis, Congzheng Song, Emiliano De Cristofaro, Vitaly Shmatikov
This paper investigates unintended feature leakage in collaborative learning, where participants train a joint model using their own data and periodically share model updates. The authors demonstrate that these updates can leak information about participants' training data, enabling both passive and active inference attacks. They show that an adversarial participant can infer the presence of exact data points (membership inference) and properties that hold for subsets of the training data, even if these properties are unrelated to the model's main task. For example, they infer when a specific person first appears in photos used to train a gender classifier. The authors evaluate their attacks on various tasks, datasets, and learning configurations, analyzing their limitations and discussing possible defenses. They find that modern deep learning models create separate internal representations of features, some of which are independent of the task being learned. These "unintended" features can leak information about participants' training data. They also show that an active adversary can use multi-task learning to trick the joint model into learning better separations of features of interest, extracting more information. The authors demonstrate that their attacks have direct privacy implications. For example, they infer with high accuracy that a certain person appears in a single training batch even if half of the photos in the batch depict other people. They also infer the specialty of a doctor being reviewed with perfect accuracy and identify the author of reviews even when the reviews account for less than a third of the batch. The authors also measure the performance of their attacks against the number of participants. On image-classification tasks, AUC degrades once the number of participants exceeds a dozen or so. On sentiment-analysis tasks with Yelp reviews, AUC of author identification remains high for many authors even with 30 participants. The paper discusses the limitations of current defenses, such as sharing fewer gradients, reducing the dimensionality of the input space, and dropout, which do not effectively thwart their attacks. They also attempt to use participant-level differential privacy, which is geared to work with thousands of users, but the joint model fails to converge in their setting. The authors conclude that collaborative learning can leak information about participants' training data, even when the data is sensitive. They emphasize the need for further research into unintended feature leakage in collaborative learning.This paper investigates unintended feature leakage in collaborative learning, where participants train a joint model using their own data and periodically share model updates. The authors demonstrate that these updates can leak information about participants' training data, enabling both passive and active inference attacks. They show that an adversarial participant can infer the presence of exact data points (membership inference) and properties that hold for subsets of the training data, even if these properties are unrelated to the model's main task. For example, they infer when a specific person first appears in photos used to train a gender classifier. The authors evaluate their attacks on various tasks, datasets, and learning configurations, analyzing their limitations and discussing possible defenses. They find that modern deep learning models create separate internal representations of features, some of which are independent of the task being learned. These "unintended" features can leak information about participants' training data. They also show that an active adversary can use multi-task learning to trick the joint model into learning better separations of features of interest, extracting more information. The authors demonstrate that their attacks have direct privacy implications. For example, they infer with high accuracy that a certain person appears in a single training batch even if half of the photos in the batch depict other people. They also infer the specialty of a doctor being reviewed with perfect accuracy and identify the author of reviews even when the reviews account for less than a third of the batch. The authors also measure the performance of their attacks against the number of participants. On image-classification tasks, AUC degrades once the number of participants exceeds a dozen or so. On sentiment-analysis tasks with Yelp reviews, AUC of author identification remains high for many authors even with 30 participants. The paper discusses the limitations of current defenses, such as sharing fewer gradients, reducing the dimensionality of the input space, and dropout, which do not effectively thwart their attacks. They also attempt to use participant-level differential privacy, which is geared to work with thousands of users, but the joint model fails to converge in their setting. The authors conclude that collaborative learning can leak information about participants' training data, even when the data is sensitive. They emphasize the need for further research into unintended feature leakage in collaborative learning.
Reach us at info@study.space
Understanding Exploiting Unintended Feature Leakage in Collaborative Learning