Rethinking LLM Memorization through the Lens of Adversarial Compression

Rethinking LLM Memorization through the Lens of Adversarial Compression

1 Jul 2024 | Avi Schwarzschild, Zhili Feng, Pratyush Maini, Zachary C. Lipton, J. Zico Kolter
This paper introduces the Adversarial Compression Ratio (ACR) as a new metric for assessing memorization in large language models (LLMs). The ACR measures how short a prompt needs to be to elicit a specific string from the training data, indicating whether the model has memorized that string. The authors argue that this definition provides a more intuitive and practical way to assess memorization compared to existing definitions, which often rely on exact reproduction or completion of training data. The ACR is defined as the ratio of the length of the target string to the length of the shortest prompt that elicits it. The authors show that this metric can be used to evaluate whether LLMs have memorized certain strings, even after attempts to unlearn them. They also demonstrate that the ACR is a useful tool for legal and practical purposes, as it can help determine whether models are violating data usage terms. The authors also show that the ACR is not affected by unlearning attempts, suggesting that LLMs may still memorize data even after being instructed to forget it. The paper concludes that the ACR provides a more accurate and practical way to assess memorization in LLMs compared to existing definitions.This paper introduces the Adversarial Compression Ratio (ACR) as a new metric for assessing memorization in large language models (LLMs). The ACR measures how short a prompt needs to be to elicit a specific string from the training data, indicating whether the model has memorized that string. The authors argue that this definition provides a more intuitive and practical way to assess memorization compared to existing definitions, which often rely on exact reproduction or completion of training data. The ACR is defined as the ratio of the length of the target string to the length of the shortest prompt that elicits it. The authors show that this metric can be used to evaluate whether LLMs have memorized certain strings, even after attempts to unlearn them. They also demonstrate that the ACR is a useful tool for legal and practical purposes, as it can help determine whether models are violating data usage terms. The authors also show that the ACR is not affected by unlearning attempts, suggesting that LLMs may still memorize data even after being instructed to forget it. The paper concludes that the ACR provides a more accurate and practical way to assess memorization in LLMs compared to existing definitions.
Reach us at info@study.space
[slides] Rethinking LLM Memorization through the Lens of Adversarial Compression | StudySpace