Min-K%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models

Min-K%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models

23 May 2024 | Jingyang Zhang, Jingwei Sun, Eric Yeats, Yang Ouyang, Martin Kuo, Jianyi Zhang, Hao Frank Yang, Hai Li
Min-K%++ is a novel and theoretically motivated method for detecting pre-training data in large language models (LLMs). The method is based on the insight that training samples tend to be local maxima of the modeled distribution along each input dimension due to maximum likelihood training. This insight allows the problem of detecting training data to be translated into identifying local maxima. The core idea of Min-K%++ is to determine whether the input forms a mode or has relatively high probability under the conditional categorical distribution. The method is evaluated on two benchmarks: WikiMIA and MIMIR. On WikiMIA, Min-K%++ outperforms the runner-up by 6.2% to 10.5% in detection AUROC averaged over five models. On MIMIR, it consistently improves upon reference-free methods while performing on par with reference-based methods. The method is also effective in an online detection setting, where it performs the best. Min-K%++ is robust to hyperparameter choices and achieves significant improvements over existing methods. The method is theoretically grounded and has potential for generalization to other modalities and models. The work contributes a novel method for pre-training data detection that outperforms existing methods on established benchmarks.Min-K%++ is a novel and theoretically motivated method for detecting pre-training data in large language models (LLMs). The method is based on the insight that training samples tend to be local maxima of the modeled distribution along each input dimension due to maximum likelihood training. This insight allows the problem of detecting training data to be translated into identifying local maxima. The core idea of Min-K%++ is to determine whether the input forms a mode or has relatively high probability under the conditional categorical distribution. The method is evaluated on two benchmarks: WikiMIA and MIMIR. On WikiMIA, Min-K%++ outperforms the runner-up by 6.2% to 10.5% in detection AUROC averaged over five models. On MIMIR, it consistently improves upon reference-free methods while performing on par with reference-based methods. The method is also effective in an online detection setting, where it performs the best. Min-K%++ is robust to hyperparameter choices and achieves significant improvements over existing methods. The method is theoretically grounded and has potential for generalization to other modalities and models. The work contributes a novel method for pre-training data detection that outperforms existing methods on established benchmarks.
Reach us at info@study.space
Understanding Min-K%25%2B%2B%3A Improved Baseline for Detecting Pre-Training Data from Large Language Models