[slides and audio] Min-K%25%2B%2B%3A Improved Baseline for Detecting Pre-Training Data from Large Language Models

The paper "Min-K%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models" addresses the critical issue of pre-training data detection in large language models (LLMs), which is crucial for addressing concerns such as copyright violation and test data contamination. The authors propose a novel and theoretically motivated methodology called Min-K%++, which leverages the insight that training samples tend to be local maxima of the modeled distribution along each input dimension. This insight allows the problem to be translated into identifying local maxima. The method is designed to work under the discrete distribution modeled by LLMs, focusing on determining whether the input forms a mode or has a relatively high probability under the conditional categorical distribution. Empirical results show that Min-K%++ outperforms existing methods, including the state-of-the-art Min-K%, on multiple benchmarks. On the WikiMIA benchmark, Min-K%++ achieves an average improvement of 6.2% to 10.5% in detection AUROC over the runner-up method. On the more challenging MIMIR benchmark, it consistently improves upon reference-free methods while performing on par with reference-based methods that require an extra reference model. The paper also includes an online detection setting to simulate "detect-while-generating" scenarios, where Min-K%++ performs the best. Additionally, the method is interpreted from a calibration perspective, with both calibration factors contributing to its high performance. The authors discuss the limitations and future directions, emphasizing the need to understand distribution shifts and the potential generalization of their method to other modalities and models.The paper "Min-K%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models" addresses the critical issue of pre-training data detection in large language models (LLMs), which is crucial for addressing concerns such as copyright violation and test data contamination. The authors propose a novel and theoretically motivated methodology called Min-K%++, which leverages the insight that training samples tend to be local maxima of the modeled distribution along each input dimension. This insight allows the problem to be translated into identifying local maxima. The method is designed to work under the discrete distribution modeled by LLMs, focusing on determining whether the input forms a mode or has a relatively high probability under the conditional categorical distribution. Empirical results show that Min-K%++ outperforms existing methods, including the state-of-the-art Min-K%, on multiple benchmarks. On the WikiMIA benchmark, Min-K%++ achieves an average improvement of 6.2% to 10.5% in detection AUROC over the runner-up method. On the more challenging MIMIR benchmark, it consistently improves upon reference-free methods while performing on par with reference-based methods that require an extra reference model. The paper also includes an online detection setting to simulate "detect-while-generating" scenarios, where Min-K%++ performs the best. Additionally, the method is interpreted from a calibration perspective, with both calibration factors contributing to its high performance. The authors discuss the limitations and future directions, emphasizing the need to understand distribution shifts and the potential generalization of their method to other modalities and models.

Min-K%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models

23 May 2024 | Jingyang Zhang, Jingwei Sun, Eric Yeats, Yang Ouyang, Martin Kuo, Jianyi Zhang, Hao Frank Yang, Hai Li