[slides] Do Membership Inference Attacks Work on Large Language Models%3F

This paper investigates the effectiveness of Membership Inference Attacks (MIAs) on large language models (LLMs). Despite extensive research on traditional machine learning models, there has been limited study on MIAs in the context of LLMs. The authors perform a large-scale evaluation of five MIAs on a suite of LLMs trained on the Pile, ranging from 160M to 12B parameters. They find that MIAs perform poorly, with most attacks barely outperforming random guessing across varying LLM sizes and domains. The poor performance is attributed to the combination of large datasets and few training iterations, as well as the inherent fuzzy boundary between members and non-members. The study identifies specific settings where LLMs are vulnerable to MIAs and shows that the success in these settings can be due to distribution shifts, such as when members and non-members are drawn from the same domain but with different temporal ranges. The authors release a unified benchmark package that includes all existing MIAs, providing a resource for future research. The paper also discusses the challenges and implications of MIAs on LLMs, including the need to re-evaluate evaluation settings and the impact of data domain characteristics on MIA performance.This paper investigates the effectiveness of Membership Inference Attacks (MIAs) on large language models (LLMs). Despite extensive research on traditional machine learning models, there has been limited study on MIAs in the context of LLMs. The authors perform a large-scale evaluation of five MIAs on a suite of LLMs trained on the Pile, ranging from 160M to 12B parameters. They find that MIAs perform poorly, with most attacks barely outperforming random guessing across varying LLM sizes and domains. The poor performance is attributed to the combination of large datasets and few training iterations, as well as the inherent fuzzy boundary between members and non-members. The study identifies specific settings where LLMs are vulnerable to MIAs and shows that the success in these settings can be due to distribution shifts, such as when members and non-members are drawn from the same domain but with different temporal ranges. The authors release a unified benchmark package that includes all existing MIAs, providing a resource for future research. The paper also discusses the challenges and implications of MIAs on LLMs, including the need to re-evaluate evaluation settings and the impact of data domain characteristics on MIA performance.

Do Membership Inference Attacks Work on Large Language Models?

12 Feb 2024 | Michael Duan * 1 Anshuman Suri * 2 Niloofar Miresghallah 1 Sewon Min 1 Weijia Shi 1 Luke Zettlemoyer 1 Yulia Tsvetkov 1 Yejin Choi 1 3 David Evans 2 Hannaneh Hajishirzi 1 3