MUSE: Machine Unlearning Six-Way Evaluation for Language Models

MUSE: Machine Unlearning Six-Way Evaluation for Language Models

14 Jul 2024 | Weijia Shi*, Jaechan Lee*, Yangsibo Huang*, Sadhika Malladi*, Jieyu Zhao, Ari Holtzman, Daogao Liu, Luke Zettlemoyer, Noah A. Smith, Chiyuan Zhang
The paper introduces MUSE (Machine Unlearning Six-Way Evaluation), a comprehensive benchmark for evaluating machine unlearning algorithms in language models. MUSE aims to address the limitations of existing evaluations by providing a systematic framework that considers both the perspectives of data owners and model deployers. The benchmark evaluates six key properties: no verbatim memorization, no knowledge memorization, no privacy leakage, utility preservation on data not intended for removal, scalability with respect to the size of removal requests, and sustainability over sequential unlearning requests. The evaluation is conducted on two datasets: news articles and the Harry Potter book series. The results show that while most algorithms effectively prevent verbatim and knowledge memorization, they often suffer from severe privacy leakage and fail to preserve model utility. Additionally, existing algorithms struggle with large-scale content removal and successive unlearning requests. The findings highlight the need for more robust unlearning methods and encourage further research in this area. The authors also release their benchmark to facilitate further evaluations and extensions to other modalities.The paper introduces MUSE (Machine Unlearning Six-Way Evaluation), a comprehensive benchmark for evaluating machine unlearning algorithms in language models. MUSE aims to address the limitations of existing evaluations by providing a systematic framework that considers both the perspectives of data owners and model deployers. The benchmark evaluates six key properties: no verbatim memorization, no knowledge memorization, no privacy leakage, utility preservation on data not intended for removal, scalability with respect to the size of removal requests, and sustainability over sequential unlearning requests. The evaluation is conducted on two datasets: news articles and the Harry Potter book series. The results show that while most algorithms effectively prevent verbatim and knowledge memorization, they often suffer from severe privacy leakage and fail to preserve model utility. Additionally, existing algorithms struggle with large-scale content removal and successive unlearning requests. The findings highlight the need for more robust unlearning methods and encourage further research in this area. The authors also release their benchmark to facilitate further evaluations and extensions to other modalities.
Reach us at info@study.space