DE-COP: Detecting Copyrighted Content in Language Models Training Data

DE-COP: Detecting Copyrighted Content in Language Models Training Data

2024 | André V. Duarte, Xuandong Zhao, Arlindo L. Oliveira, Lei Li
DE-COP is a novel method for detecting whether copyrighted content was used in the training data of language models. The method involves using multiple-choice questions to determine if a model can distinguish between verbatim text and its paraphrases. The core idea is that models are more likely to correctly identify verbatim text if it was part of their training data. DE-COP uses a benchmark called BookTection, which includes excerpts from 165 books published before and after a model's training cutoff, along with their paraphrases. Experiments show that DE-COP outperforms previous methods by 9.6% in detection performance (AUC) on models with logits available, and achieves an average accuracy of 72% on detecting suspect books on fully black-box models where prior methods give approximately 4% accuracy. The method also includes a calibration process to minimize selection bias in label probabilities. DE-COP is applicable to any LLM and can detect a substantial amount of potentially copyrighted content in the training data. The method was tested on various model families and showed significant improvements in detection performance. The results indicate that models are likely trained on the content they can accurately identify, supporting the hypothesis that the high accuracy on suspect books is due to their inclusion in the training data. The study also highlights the importance of detecting copyrighted content in training data to ensure compliance with copyright laws and to provide accountability for content authors. The research contributes to the field of machine learning by developing methodologies for detecting data used to train language models, and it underscores the need for careful consideration of the ethical and legal implications of using copyrighted materials in model training.DE-COP is a novel method for detecting whether copyrighted content was used in the training data of language models. The method involves using multiple-choice questions to determine if a model can distinguish between verbatim text and its paraphrases. The core idea is that models are more likely to correctly identify verbatim text if it was part of their training data. DE-COP uses a benchmark called BookTection, which includes excerpts from 165 books published before and after a model's training cutoff, along with their paraphrases. Experiments show that DE-COP outperforms previous methods by 9.6% in detection performance (AUC) on models with logits available, and achieves an average accuracy of 72% on detecting suspect books on fully black-box models where prior methods give approximately 4% accuracy. The method also includes a calibration process to minimize selection bias in label probabilities. DE-COP is applicable to any LLM and can detect a substantial amount of potentially copyrighted content in the training data. The method was tested on various model families and showed significant improvements in detection performance. The results indicate that models are likely trained on the content they can accurately identify, supporting the hypothesis that the high accuracy on suspect books is due to their inclusion in the training data. The study also highlights the importance of detecting copyrighted content in training data to ensure compliance with copyright laws and to provide accountability for content authors. The research contributes to the field of machine learning by developing methodologies for detecting data used to train language models, and it underscores the need for careful consideration of the ethical and legal implications of using copyrighted materials in model training.
Reach us at info@study.space
Understanding DE-COP%3A Detecting Copyrighted Content in Language Models Training Data