14 Mar 2024 | Ruixuan Liu, Tianhao Wang, Yang Cao, Li Xiong
PreCurious is a framework that reveals a new attack surface where an attacker releases a pre-trained model and gains black-box access to the fine-tuned model, thereby escalating privacy risks such as membership inference and data extraction. The key idea is to manipulate the memorization stage of the pre-trained model and guide fine-tuning with a seemingly legitimate configuration. PreCurious demonstrates the possibility of breaking invulnerability in a stealthy manner compared to fine-tuning on a benign model. By leveraging a sanitized dataset, PreCurious can extract secrets under differentially private fine-tuning. The framework highlights the risks of using untrusted pre-trained models and the limitations of common-sense defenses like differential privacy and deduplication. PreCurious also shows that publishing de-identified datasets can still expose secrets if they are included in future fine-tuning. The framework is evaluated on various datasets and models, showing that PreCurious significantly increases the privacy risk for membership inference and data extraction. The results indicate that PreCurious can amplify privacy risks even under defenses like differential privacy and deduplication. The framework also demonstrates that the privacy-utility trade-off can be broken by manipulating the model initialization and training process. The key intuition is that by controlling the memorization stage of the pre-trained model, PreCurious can increase the privacy risk of the fine-tuned model. The framework is evaluated on various datasets and models, showing that PreCurious significantly increases the privacy risk for membership inference and data extraction. The results indicate that PreCurious can amplify privacy risks even under defenses like differential privacy and deduplication. The framework also demonstrates that the privacy-utility trade-off can be broken by manipulating the model initialization and training process.PreCurious is a framework that reveals a new attack surface where an attacker releases a pre-trained model and gains black-box access to the fine-tuned model, thereby escalating privacy risks such as membership inference and data extraction. The key idea is to manipulate the memorization stage of the pre-trained model and guide fine-tuning with a seemingly legitimate configuration. PreCurious demonstrates the possibility of breaking invulnerability in a stealthy manner compared to fine-tuning on a benign model. By leveraging a sanitized dataset, PreCurious can extract secrets under differentially private fine-tuning. The framework highlights the risks of using untrusted pre-trained models and the limitations of common-sense defenses like differential privacy and deduplication. PreCurious also shows that publishing de-identified datasets can still expose secrets if they are included in future fine-tuning. The framework is evaluated on various datasets and models, showing that PreCurious significantly increases the privacy risk for membership inference and data extraction. The results indicate that PreCurious can amplify privacy risks even under defenses like differential privacy and deduplication. The framework also demonstrates that the privacy-utility trade-off can be broken by manipulating the model initialization and training process. The key intuition is that by controlling the memorization stage of the pre-trained model, PreCurious can increase the privacy risk of the fine-tuned model. The framework is evaluated on various datasets and models, showing that PreCurious significantly increases the privacy risk for membership inference and data extraction. The results indicate that PreCurious can amplify privacy risks even under defenses like differential privacy and deduplication. The framework also demonstrates that the privacy-utility trade-off can be broken by manipulating the model initialization and training process.