27 Jun 2024 | Ilia Shumailov, Jamie Hayes, Eleni Triantafillou, Guillermo Ortiz-Jimenez, Nicolas Papernot, Matthew Jagielski, Itay Yona, Heidi Howard, Eugene Bagdasaryan
Unlearning is not sufficient for content regulation in advanced generative AI. This paper argues that unlearning, while useful for removing privacy-sensitive information, is insufficient for preventing models from generating impermissible content. The paper highlights an underlying inconsistency in the unlearning paradigm when applied to large language models (LLMs), particularly due to in-context learning (ICL). Even if knowledge is unlearned, it can be reintroduced through contextual interactions, making the model capable of behaving as if it still knows the forgotten knowledge. This raises concerns about the effectiveness of unlearning for content regulation.
The paper introduces the concept of "ununlearning," where unlearned knowledge is reintroduced through ICL, effectively rendering the model capable of generating impermissible content. The authors argue that content filtering mechanisms are necessary to prevent the resurgence of undesirable knowledge. They also discuss the limitations of unlearning, including the difficulty of reasoning about knowledge compositionality and the challenge of attributing model behavior to specific knowledge.
The paper discusses the types of knowledge available to models, including axioms and theorems, and how unlearning can leave underlying axiomatic knowledge intact, which can be used for other purposes. The authors also explore the implications of unlearning, including the need for effective filtering mechanisms and the challenge of attributing knowledge to specific sources.
The paper concludes that unlearning is an incomplete solution for impermissible knowledge removal in LLMs with strong ICL capabilities. Ununlearning forces a rethinking of unlearning as a one-size-fits-all solution and emphasizes the need for content filtering. The paper also discusses the limitations of unlearning, including the potential for increased privacy leakage and the difficulty of preventing harmful use of models.Unlearning is not sufficient for content regulation in advanced generative AI. This paper argues that unlearning, while useful for removing privacy-sensitive information, is insufficient for preventing models from generating impermissible content. The paper highlights an underlying inconsistency in the unlearning paradigm when applied to large language models (LLMs), particularly due to in-context learning (ICL). Even if knowledge is unlearned, it can be reintroduced through contextual interactions, making the model capable of behaving as if it still knows the forgotten knowledge. This raises concerns about the effectiveness of unlearning for content regulation.
The paper introduces the concept of "ununlearning," where unlearned knowledge is reintroduced through ICL, effectively rendering the model capable of generating impermissible content. The authors argue that content filtering mechanisms are necessary to prevent the resurgence of undesirable knowledge. They also discuss the limitations of unlearning, including the difficulty of reasoning about knowledge compositionality and the challenge of attributing model behavior to specific knowledge.
The paper discusses the types of knowledge available to models, including axioms and theorems, and how unlearning can leave underlying axiomatic knowledge intact, which can be used for other purposes. The authors also explore the implications of unlearning, including the need for effective filtering mechanisms and the challenge of attributing knowledge to specific sources.
The paper concludes that unlearning is an incomplete solution for impermissible knowledge removal in LLMs with strong ICL capabilities. Ununlearning forces a rethinking of unlearning as a one-size-fits-all solution and emphasizes the need for content filtering. The paper also discusses the limitations of unlearning, including the potential for increased privacy leakage and the difficulty of preventing harmful use of models.