This paper introduces privacy backdoors, a new type of attack that compromises the privacy of data used to fine-tune pretrained machine learning models. By tampering with a pretrained model's weights, an attacker can fully compromise the privacy of the finetuning data. The authors demonstrate how to build privacy backdoors for various models, including transformers, which enable an attacker to reconstruct individual finetuning samples with high success. They further show that backdoored models allow for tight privacy attacks on models trained with differential privacy (DP). The common optimistic practice of training DP models with loose privacy guarantees is thus insecure if the model is not trusted.
The authors propose a new backdoor design that is single-use: once the backdoor activates and a data point is written to the model's weights, the backdoor becomes inactive, preventing further alteration of those weights during training. This backdoor acts like a latch, ensuring that once data is written to the model's weights, it remains there until the end of training.
The authors apply these attacks to MLPs and pretrained transformers (ViT and BERT) and reconstruct dozens of finetuning examples across various downstream tasks. They also consider a stronger black-box threat model, where the attacker only has query access to the finetuned model. By adapting techniques from the model extraction literature, they show that even a black-box attacker can recover entire training inputs. They further show that their backdoors enable simpler, perfect membership inference attacks, which infer with 100% accuracy whether a data point was used for training.
The authors use these backdoors to build the first tight end-to-end attack on the seminal differentially private SGD algorithm of Abadi et al. (2016). This attack shows that the privacy leakage observable by the adversary nearly matches the provable upper-bound from the algorithm's privacy analysis. This challenges the common assumption that the privacy guarantees of DPSGD are overly conservative in practice. The paper highlights a crucial and overlooked supply chain attack on machine learning privacy.This paper introduces privacy backdoors, a new type of attack that compromises the privacy of data used to fine-tune pretrained machine learning models. By tampering with a pretrained model's weights, an attacker can fully compromise the privacy of the finetuning data. The authors demonstrate how to build privacy backdoors for various models, including transformers, which enable an attacker to reconstruct individual finetuning samples with high success. They further show that backdoored models allow for tight privacy attacks on models trained with differential privacy (DP). The common optimistic practice of training DP models with loose privacy guarantees is thus insecure if the model is not trusted.
The authors propose a new backdoor design that is single-use: once the backdoor activates and a data point is written to the model's weights, the backdoor becomes inactive, preventing further alteration of those weights during training. This backdoor acts like a latch, ensuring that once data is written to the model's weights, it remains there until the end of training.
The authors apply these attacks to MLPs and pretrained transformers (ViT and BERT) and reconstruct dozens of finetuning examples across various downstream tasks. They also consider a stronger black-box threat model, where the attacker only has query access to the finetuned model. By adapting techniques from the model extraction literature, they show that even a black-box attacker can recover entire training inputs. They further show that their backdoors enable simpler, perfect membership inference attacks, which infer with 100% accuracy whether a data point was used for training.
The authors use these backdoors to build the first tight end-to-end attack on the seminal differentially private SGD algorithm of Abadi et al. (2016). This attack shows that the privacy leakage observable by the adversary nearly matches the provable upper-bound from the algorithm's privacy analysis. This challenges the common assumption that the privacy guarantees of DPSGD are overly conservative in practice. The paper highlights a crucial and overlooked supply chain attack on machine learning privacy.