2024 | Chengzhi Mao & Carl Vondrick & Hao Wang & Junfeng Yang
Raidar is a method for detecting AI-generated text by prompting large language models (LLMs) to rewrite text and measuring the editing distance of the output. The method leverages the fact that LLMs are more likely to modify human-written text than AI-generated text when asked to rewrite. This is because LLMs often perceive AI-generated text as high-quality, leading to fewer modifications. Raidar calculates the editing distance between the original and rewritten text to detect AI-generated content. The method is simple, effective, and compatible with black-box LLMs, requiring only word-level outputs. It is robust to new content and has been shown to significantly improve detection scores across various domains, including News, creative writing, student essays, code, Yelp reviews, and arXiv papers, with gains of up to 29 points. Raidar operates on symbolic word outputs from LLMs, eliminating the need for deep neural network features, which boosts its robustness, generalizability, and adaptability. By focusing on the character editing distance between the original and rewritten text, Raidar is semantically agnostic, reducing irrelevant and spurious correlations. This feature-agnostic design allows for seamless integration with the latest LLM models that only provide word outputs via API. Importantly, the detector does not require the original generating model, allowing model A to detect the output of model B. Visualizations and empirical experiments show that Raidar significantly improves detection for several established paragraph-level detection benchmarks. Raidar advances the state-of-the-art detection methods by up to 29 points. The method generalizes to six different datasets and domains and is robust when detecting text generated from different language models, such as Ada, Text-Davinci-002, Claude, and GPT-3.5, even though the model has never been trained on text generated from those models. Additionally, the detection remains robust even when the text generation is aware of the detection mechanism and uses tailored prompts to bypass the detection. The data and code are available at https://github.com/cvlab-columbia/RaidarLLMDetect.git.Raidar is a method for detecting AI-generated text by prompting large language models (LLMs) to rewrite text and measuring the editing distance of the output. The method leverages the fact that LLMs are more likely to modify human-written text than AI-generated text when asked to rewrite. This is because LLMs often perceive AI-generated text as high-quality, leading to fewer modifications. Raidar calculates the editing distance between the original and rewritten text to detect AI-generated content. The method is simple, effective, and compatible with black-box LLMs, requiring only word-level outputs. It is robust to new content and has been shown to significantly improve detection scores across various domains, including News, creative writing, student essays, code, Yelp reviews, and arXiv papers, with gains of up to 29 points. Raidar operates on symbolic word outputs from LLMs, eliminating the need for deep neural network features, which boosts its robustness, generalizability, and adaptability. By focusing on the character editing distance between the original and rewritten text, Raidar is semantically agnostic, reducing irrelevant and spurious correlations. This feature-agnostic design allows for seamless integration with the latest LLM models that only provide word outputs via API. Importantly, the detector does not require the original generating model, allowing model A to detect the output of model B. Visualizations and empirical experiments show that Raidar significantly improves detection for several established paragraph-level detection benchmarks. Raidar advances the state-of-the-art detection methods by up to 29 points. The method generalizes to six different datasets and domains and is robust when detecting text generated from different language models, such as Ada, Text-Davinci-002, Claude, and GPT-3.5, even though the model has never been trained on text generated from those models. Additionally, the detection remains robust even when the text generation is aware of the detection mechanism and uses tailored prompts to bypass the detection. The data and code are available at https://github.com/cvlab-columbia/RaidarLLMDetect.git.