Stumbling Blocks: Stress Testing the Robustness of Machine-Generated Text Detectors Under Attacks

Stumbling Blocks: Stress Testing the Robustness of Machine-Generated Text Detectors Under Attacks

18 Feb 2024 | Yichen Wang, Shangbin Feng, Abe Bohan Hou, Xiao Pu, Chao Shen, Xiaoming Liu, Yulia Tsvetkov, Tianxing He
This paper evaluates the robustness of machine-generated text detectors under various malicious attacks, including editing, paraphrasing, prompting, and co-generating. The study uses a comprehensive suite of attacks and measures their impact on detector performance through metrics such as Levenshtein Edit Distance, Jaro Similarity, Perplexity, MAUVE, and BERTScore. The results show that no detector consistently performs well across all attacks, with average performance drops of 35% across all attacks. Watermarking detectors are found to be the most robust, followed by model-based detectors. The paper also identifies specific vulnerabilities in different detectors and proposes initial defense patches to improve robustness. The findings highlight the need for more robust methods to detect machine-generated text in realistic scenarios.This paper evaluates the robustness of machine-generated text detectors under various malicious attacks, including editing, paraphrasing, prompting, and co-generating. The study uses a comprehensive suite of attacks and measures their impact on detector performance through metrics such as Levenshtein Edit Distance, Jaro Similarity, Perplexity, MAUVE, and BERTScore. The results show that no detector consistently performs well across all attacks, with average performance drops of 35% across all attacks. Watermarking detectors are found to be the most robust, followed by model-based detectors. The paper also identifies specific vulnerabilities in different detectors and proposes initial defense patches to improve robustness. The findings highlight the need for more robust methods to detect machine-generated text in realistic scenarios.
Reach us at info@study.space
[slides] Stumbling Blocks%3A Stress Testing the Robustness of Machine-Generated Text Detectors Under Attacks | StudySpace