17 Apr 2024 | James Y. Huang, Wenxuan Zhou, Fei Wang, Fred Morstatter, Sheng Zhang, Hoifung Poon, Muhao Chen
The paper "Offset Unlearning for Large Language Models" addresses the ethical and legal concerns arising from the memorization of sensitive information by Large Language Models (LLMs). It introduces δ-UNLEARNING, a novel framework designed to unlearn problematic data from black-box LLMs without modifying their internal weights. δ-UNLEARNING learns the logit offset needed for unlearning by contrasting logits from smaller, white-box models. This approach ensures privacy protection, efficient training, and version control while maintaining or improving performance on out-of-forget-scope tasks. Experiments on the TOFU benchmark demonstrate that δ-UNLEARNING can effectively unlearn target data while preserving or enhancing general utility. The method is also versatile, compatible with various unlearning algorithms, and shows strong performance in balancing forget quality and model utility. The paper concludes with a discussion on the limitations and future directions of δ-UNLEARNING.The paper "Offset Unlearning for Large Language Models" addresses the ethical and legal concerns arising from the memorization of sensitive information by Large Language Models (LLMs). It introduces δ-UNLEARNING, a novel framework designed to unlearn problematic data from black-box LLMs without modifying their internal weights. δ-UNLEARNING learns the logit offset needed for unlearning by contrasting logits from smaller, white-box models. This approach ensures privacy protection, efficient training, and version control while maintaining or improving performance on out-of-forget-scope tasks. Experiments on the TOFU benchmark demonstrate that δ-UNLEARNING can effectively unlearn target data while preserving or enhancing general utility. The method is also versatile, compatible with various unlearning algorithms, and shows strong performance in balancing forget quality and model utility. The paper concludes with a discussion on the limitations and future directions of δ-UNLEARNING.