Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs

Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs

14 Jun 2024 | Abhimanyu Hans, Yuxin Wen, Neel Jain, John Kirchenbauer, Hamid Kazemi, Prajwal Singhania, Siddharth Singh, Gowthami Somepalli, Jonas Geiping, Abhinav Bhatela, Tom Goldstein
This paper introduces the goldfish loss, a simple technique to mitigate memorization in large language models (LLMs). The goldfish loss modifies the next-token prediction objective by randomly excluding a subset of tokens from the loss computation, preventing the model from memorizing and verbatim reproducing training data. This approach significantly reduces extractable memorization while having little to no impact on downstream benchmarks. The goldfish loss is implemented by masking a subset of tokens during training, ensuring the model does not learn the exact sequence of those tokens. At inference time, the model must guess the masked tokens, leading it to diverge from the training data sequence. This method is effective in preventing memorization, as demonstrated by experiments on LLaMA-2-7B and TinyLLaMA-1B models. The goldfish loss significantly reduces memorization of training data, even when the model is trained on a small number of articles for 100 epochs. The paper also explores the effectiveness of the goldfish loss in various scenarios, including standard training and adversarial extraction attempts. While the goldfish loss reduces memorization, it does not completely eliminate the risk of membership inference attacks. However, it is more effective than standard training in preventing long-form verbatim memorization. The goldfish loss is compared to other methods for mitigating memorization, including differential privacy and regularization techniques. It is found to be more effective than these methods in preventing memorization while maintaining model performance. The paper also discusses the limitations of the goldfish loss, including its vulnerability to leakage under near-duplicated text segments and the need for proper hashing methods to ensure consistent masking. Overall, the goldfish loss is a promising technique for mitigating memorization in LLMs, offering a balance between effectiveness and computational efficiency. It is particularly useful in industrial settings where privacy and copyright risks are a concern. The paper concludes that further research is needed to understand how the benefits of the goldfish loss scale to larger models and to improve its effectiveness in preventing membership inference attacks.This paper introduces the goldfish loss, a simple technique to mitigate memorization in large language models (LLMs). The goldfish loss modifies the next-token prediction objective by randomly excluding a subset of tokens from the loss computation, preventing the model from memorizing and verbatim reproducing training data. This approach significantly reduces extractable memorization while having little to no impact on downstream benchmarks. The goldfish loss is implemented by masking a subset of tokens during training, ensuring the model does not learn the exact sequence of those tokens. At inference time, the model must guess the masked tokens, leading it to diverge from the training data sequence. This method is effective in preventing memorization, as demonstrated by experiments on LLaMA-2-7B and TinyLLaMA-1B models. The goldfish loss significantly reduces memorization of training data, even when the model is trained on a small number of articles for 100 epochs. The paper also explores the effectiveness of the goldfish loss in various scenarios, including standard training and adversarial extraction attempts. While the goldfish loss reduces memorization, it does not completely eliminate the risk of membership inference attacks. However, it is more effective than standard training in preventing long-form verbatim memorization. The goldfish loss is compared to other methods for mitigating memorization, including differential privacy and regularization techniques. It is found to be more effective than these methods in preventing memorization while maintaining model performance. The paper also discusses the limitations of the goldfish loss, including its vulnerability to leakage under near-duplicated text segments and the need for proper hashing methods to ensure consistent masking. Overall, the goldfish loss is a promising technique for mitigating memorization in LLMs, offering a balance between effectiveness and computational efficiency. It is particularly useful in industrial settings where privacy and copyright risks are a concern. The paper concludes that further research is needed to understand how the benefits of the goldfish loss scale to larger models and to improve its effectiveness in preventing membership inference attacks.
Reach us at info@study.space