Understanding Proving membership in LLM pretraining data via data watermarks

This paper proposes the use of data watermarks to detect whether a rightholder's documents were used in the training of large language models (LLMs). The authors introduce two types of data watermarks: one that appends random character sequences and another that substitutes ASCII characters with Unicode lookalikes. They frame the detection as a hypothesis test, providing statistical guarantees on the false detection rate. The study explores how aspects of watermark design, such as length, number of duplications, and interference, affect the power of the hypothesis test. They also examine how the strength of watermarks changes under model and dataset scaling, finding that while increasing the dataset size decreases the strength of the watermark, watermarks remain strong if the model size also increases. Finally, they demonstrate the feasibility of data watermarks on a 176-billion parameter LLM by detecting SHA hashes from BLOOM-176B's training data, showing that hashes can be robustly detected if they occurred more than 90 times. The results suggest that data watermarks can enable principled detection of dataset membership in real-world applications.This paper proposes the use of data watermarks to detect whether a rightholder's documents were used in the training of large language models (LLMs). The authors introduce two types of data watermarks: one that appends random character sequences and another that substitutes ASCII characters with Unicode lookalikes. They frame the detection as a hypothesis test, providing statistical guarantees on the false detection rate. The study explores how aspects of watermark design, such as length, number of duplications, and interference, affect the power of the hypothesis test. They also examine how the strength of watermarks changes under model and dataset scaling, finding that while increasing the dataset size decreases the strength of the watermark, watermarks remain strong if the model size also increases. Finally, they demonstrate the feasibility of data watermarks on a 176-billion parameter LLM by detecting SHA hashes from BLOOM-176B's training data, showing that hashes can be robustly detected if they occurred more than 90 times. The results suggest that data watermarks can enable principled detection of dataset membership in real-world applications.

Proving membership in LLM pretraining data via data watermarks

17 Aug 2024 | Johnny Tian-Zheng Wei, Ryan Yixiang Wang, Robin Jia