RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors

RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors

10 Jun 2024 | Liam Dugan, Alyssa Hwang, Filip Trhlik, Josh Magnus Ludan, Andrew Zhu, Hainiu Xu, Daphne Ippolito, Chris Callison-Burch
RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors This paper introduces RAID, the largest and most challenging benchmark dataset for detecting machine-generated text. RAID includes over 6 million generations from 11 models, 8 domains, 11 adversarial attacks, and 4 decoding strategies. The dataset was created by sampling human-written text from 8 domains, generating corresponding prompts, and then applying the 11 models, 4 decoding strategies, and 11 adversarial attacks to these prompts. The resulting dataset contains over 6 million generations, the largest of its kind. The paper evaluates 12 detectors (8 open-source and 4 closed-source) on RAID and finds that current detectors are easily fooled by adversarial attacks, variations in sampling strategies, repetition penalties, and unseen generative models. The authors release their data and a leaderboard to encourage future research. The paper also discusses the limitations of current detectors and the need for more robust evaluation methods. It highlights the importance of evaluating detectors on a wide range of domains, models, and adversarial attacks. The authors also note that detectors often perform better on domains and models they have been trained on, and that some detectors are vulnerable to specific types of adversarial attacks. The paper concludes that while current detectors are not yet robust enough for widespread deployment, there are promising signs of improvement. The authors encourage future work to build on this by including more models, languages, and generation settings in future shared resources. They also emphasize the importance of evaluating robustness in detection and the need for more comprehensive and diverse benchmarks.RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors This paper introduces RAID, the largest and most challenging benchmark dataset for detecting machine-generated text. RAID includes over 6 million generations from 11 models, 8 domains, 11 adversarial attacks, and 4 decoding strategies. The dataset was created by sampling human-written text from 8 domains, generating corresponding prompts, and then applying the 11 models, 4 decoding strategies, and 11 adversarial attacks to these prompts. The resulting dataset contains over 6 million generations, the largest of its kind. The paper evaluates 12 detectors (8 open-source and 4 closed-source) on RAID and finds that current detectors are easily fooled by adversarial attacks, variations in sampling strategies, repetition penalties, and unseen generative models. The authors release their data and a leaderboard to encourage future research. The paper also discusses the limitations of current detectors and the need for more robust evaluation methods. It highlights the importance of evaluating detectors on a wide range of domains, models, and adversarial attacks. The authors also note that detectors often perform better on domains and models they have been trained on, and that some detectors are vulnerable to specific types of adversarial attacks. The paper concludes that while current detectors are not yet robust enough for widespread deployment, there are promising signs of improvement. The authors encourage future work to build on this by including more models, languages, and generation settings in future shared resources. They also emphasize the importance of evaluating robustness in detection and the need for more comprehensive and diverse benchmarks.
Reach us at info@study.space