15 Jun 2021 | Nicholas Carlini1, Florian Tramèr2, Eric Wallace3, Matthew Jagielski4, Ariel Herbert-Voss5,6, Katherine Lee1, Adam Roberts1, Tom Brown5, Dawn Song3, Úlfar Erlingsson7, Alina Oprea4, Colin Raffel1
This paper demonstrates that large language models (LLMs) can memorize and leak individual training examples, even when the training data is public. The authors propose a method to extract verbatim sequences from a language model's training data using only black-box query access. They show that certain training examples are memorized by the model, even though they appear in only one document in the training data. The attack is demonstrated on GPT-2, a language model trained on public Internet data, and is able to extract hundreds of verbatim text sequences, including personally identifiable information (PII), IRC conversations, code, and UUIDs. The authors find that larger models are more vulnerable to such attacks. They also discuss the ethical implications of these attacks and suggest practical strategies to mitigate privacy leakage, such as differential privacy and careful de-duplication of training data. The paper highlights the risks of training data extraction attacks, including the potential for privacy violations and the exposure of sensitive information. The authors also show that these attacks can be applied to any language model, including those trained on sensitive data. The results indicate that LMs can memorize and leak data that was not intended to be shared, raising concerns about the privacy implications of large-scale language models.This paper demonstrates that large language models (LLMs) can memorize and leak individual training examples, even when the training data is public. The authors propose a method to extract verbatim sequences from a language model's training data using only black-box query access. They show that certain training examples are memorized by the model, even though they appear in only one document in the training data. The attack is demonstrated on GPT-2, a language model trained on public Internet data, and is able to extract hundreds of verbatim text sequences, including personally identifiable information (PII), IRC conversations, code, and UUIDs. The authors find that larger models are more vulnerable to such attacks. They also discuss the ethical implications of these attacks and suggest practical strategies to mitigate privacy leakage, such as differential privacy and careful de-duplication of training data. The paper highlights the risks of training data extraction attacks, including the potential for privacy violations and the exposure of sensitive information. The authors also show that these attacks can be applied to any language model, including those trained on sensitive data. The results indicate that LMs can memorize and leak data that was not intended to be shared, raising concerns about the privacy implications of large-scale language models.