Feb 2024 | Michael Duan, Anshuman Suri, Niloofar Mireshghallah, Sewon Min, Weijia Shi, Luke Zettlemoyer, Yulia Tsvetkov, Yejin Choi, David Evans, Hannaneh Hajishirzi
Membership inference attacks (MIAs) aim to determine whether a given data point was part of a machine learning model's training data. While MIAs have been shown to be effective in some scenarios, this study finds that they perform poorly on large language models (LLMs), often achieving results close to random guessing. The research evaluates MIAs across various LLMs, including those trained on the Pile dataset, and identifies several factors contributing to the low performance of MIAs on LLMs.
First, the large scale of training data and the use of a single training epoch reduce the effectiveness of MIAs. Second, the inherent ambiguity between members and non-members in natural language domains makes it difficult to distinguish between them. The study also finds that non-members with low n-gram overlap with members are more distinguishable by MIAs, suggesting that the difficulty of MIAs depends on the overlap between members and non-members.
The research introduces MIMIR, a unified benchmark for evaluating MIAs on LLMs, and demonstrates that MIAs perform poorly when non-members are drawn from the same domain as members but with different temporal ranges. This suggests that distribution shifts can significantly impact MIA performance. The study also highlights the importance of considering n-gram overlap when evaluating MIAs, as it can affect the accuracy of membership inference.
Overall, the findings suggest that MIAs are not effective against LLMs due to the characteristics of their training data and the inherent ambiguity in natural language. The study encourages further research into the evaluation of MIAs and the development of more effective methods for assessing membership inference in generative models.Membership inference attacks (MIAs) aim to determine whether a given data point was part of a machine learning model's training data. While MIAs have been shown to be effective in some scenarios, this study finds that they perform poorly on large language models (LLMs), often achieving results close to random guessing. The research evaluates MIAs across various LLMs, including those trained on the Pile dataset, and identifies several factors contributing to the low performance of MIAs on LLMs.
First, the large scale of training data and the use of a single training epoch reduce the effectiveness of MIAs. Second, the inherent ambiguity between members and non-members in natural language domains makes it difficult to distinguish between them. The study also finds that non-members with low n-gram overlap with members are more distinguishable by MIAs, suggesting that the difficulty of MIAs depends on the overlap between members and non-members.
The research introduces MIMIR, a unified benchmark for evaluating MIAs on LLMs, and demonstrates that MIAs perform poorly when non-members are drawn from the same domain as members but with different temporal ranges. This suggests that distribution shifts can significantly impact MIA performance. The study also highlights the importance of considering n-gram overlap when evaluating MIAs, as it can affect the accuracy of membership inference.
Overall, the findings suggest that MIAs are not effective against LLMs due to the characteristics of their training data and the inherent ambiguity in natural language. The study encourages further research into the evaluation of MIAs and the development of more effective methods for assessing membership inference in generative models.