LLM Dataset Inference: Did you train on my dataset?

LLM Dataset Inference: Did you train on my dataset?

2024 | Pratyush Maini, Hengrui Jia, Nicolas Papernot, Adam Dziedzic
This paper investigates the challenges of identifying training data used in large language models (LLMs) and evaluates the effectiveness of membership inference attacks (MIAs) in this context. The authors demonstrate that MIAs often fail to accurately determine whether a given text sequence was part of the model's training data, as they tend to detect distribution shifts rather than actual membership. They propose a new dataset inference method that statistically identifies whether a dataset was used to train an LLM, which is more reliable than MIAs. This method involves aggregating features from multiple MIAs, learning correlations between these features and membership status, and performing a statistical test to distinguish between training and validation sets. The authors show that their method can reliably distinguish between training and validation sets of the Pile dataset with statistically significant p-values less than 0.1, without any false positives. They also highlight the importance of using independent and identically distributed (IID) splits when evaluating MIAs and emphasize the need for careful experimentation to mitigate confounding factors. The paper concludes that while MIAs are not effective for identifying individual text sequence membership, dataset inference offers a more robust and practical approach for determining whether a dataset was used to train an LLM.This paper investigates the challenges of identifying training data used in large language models (LLMs) and evaluates the effectiveness of membership inference attacks (MIAs) in this context. The authors demonstrate that MIAs often fail to accurately determine whether a given text sequence was part of the model's training data, as they tend to detect distribution shifts rather than actual membership. They propose a new dataset inference method that statistically identifies whether a dataset was used to train an LLM, which is more reliable than MIAs. This method involves aggregating features from multiple MIAs, learning correlations between these features and membership status, and performing a statistical test to distinguish between training and validation sets. The authors show that their method can reliably distinguish between training and validation sets of the Pile dataset with statistically significant p-values less than 0.1, without any false positives. They also highlight the importance of using independent and identically distributed (IID) splits when evaluating MIAs and emphasize the need for careful experimentation to mitigate confounding factors. The paper concludes that while MIAs are not effective for identifying individual text sequence membership, dataset inference offers a more robust and practical approach for determining whether a dataset was used to train an LLM.
Reach us at info@study.space
Understanding LLM Dataset Inference%3A Did you train on my dataset%3F