[slides and audio] Raidar%3A geneRative AI Detection viA Rewriting

**Raidar: Generative AI Detection via Rewriting** **Authors:** Chengzhi Mao, Carl Vondrick, Hao Wang, Junfeng Yang **Institution:** Columbia University, Rutgers University **Abstract:** Large language models (LLMs) are more likely to modify human-written text than AI-generated text when asked to rewrite. This tendency arises because LLMs often perceive AI-generated text as high-quality, leading to fewer modifications. To address this issue, the authors introduce Raidar, a method that detects AI-generated content by prompting LLMs to rewrite text and calculating the editing distance of the output. Raidar significantly improves the F1 detection scores of existing AI content detection models across various domains, including News, creative writing, student essays, code, Yelp reviews, and arXiv papers, with gains of up to 29 points. The method operates solely on word symbols without high-dimensional features, making it compatible with black-box LLMs and robust to new content. **Introduction:** The increasing deployment and accessibility of LLMs pose serious risks, such as cybersecurity threats, propaganda, and academic dishonesty. Current detection methods rely on numerical output metrics, which are not available for black-box models. Raidar leverages the invariance and equivariance properties of LLMs to detect machine-generated text. By prompting LLMs to rewrite text, Raidar measures the editing distance between the original and rewritten text, which is more stable and less prone to adversarial attacks compared to traditional methods. **Methods:** Raidar uses rewriting prompts to gain additional contextual information about the input text. The key hypothesis is that text from auto-regressive generative models retains a consistent structure, which another such model will also treat as high-quality. The method operates on the symbolic word output from LLMs, eliminating the need for deep neural network features. This approach is semantically agnostic, reducing irrelevant and spurious correlations. **Results:** Experiments on various datasets show that Raidar significantly outperforms state-of-the-art detection methods, achieving up to 29 points improvement. The method generalizes well to different datasets and domains, and is robust to novel text inputs and different LLM models. It also remains effective even when the text generation is aware of the detection mechanism and uses tailored prompts to bypass it. **Conclusion:** Raidar provides a simple and effective method for detecting machine-generated text by leveraging the inherent structure of LLMs. The approach is robust and generalizable, opening up new directions for detecting machine-generated content.**Raidar: Generative AI Detection via Rewriting** **Authors:** Chengzhi Mao, Carl Vondrick, Hao Wang, Junfeng Yang **Institution:** Columbia University, Rutgers University **Abstract:** Large language models (LLMs) are more likely to modify human-written text than AI-generated text when asked to rewrite. This tendency arises because LLMs often perceive AI-generated text as high-quality, leading to fewer modifications. To address this issue, the authors introduce Raidar, a method that detects AI-generated content by prompting LLMs to rewrite text and calculating the editing distance of the output. Raidar significantly improves the F1 detection scores of existing AI content detection models across various domains, including News, creative writing, student essays, code, Yelp reviews, and arXiv papers, with gains of up to 29 points. The method operates solely on word symbols without high-dimensional features, making it compatible with black-box LLMs and robust to new content. **Introduction:** The increasing deployment and accessibility of LLMs pose serious risks, such as cybersecurity threats, propaganda, and academic dishonesty. Current detection methods rely on numerical output metrics, which are not available for black-box models. Raidar leverages the invariance and equivariance properties of LLMs to detect machine-generated text. By prompting LLMs to rewrite text, Raidar measures the editing distance between the original and rewritten text, which is more stable and less prone to adversarial attacks compared to traditional methods. **Methods:** Raidar uses rewriting prompts to gain additional contextual information about the input text. The key hypothesis is that text from auto-regressive generative models retains a consistent structure, which another such model will also treat as high-quality. The method operates on the symbolic word output from LLMs, eliminating the need for deep neural network features. This approach is semantically agnostic, reducing irrelevant and spurious correlations. **Results:** Experiments on various datasets show that Raidar significantly outperforms state-of-the-art detection methods, achieving up to 29 points improvement. The method generalizes well to different datasets and domains, and is robust to novel text inputs and different LLM models. It also remains effective even when the text generation is aware of the detection mechanism and uses tailored prompts to bypass it. **Conclusion:** Raidar provides a simple and effective method for detecting machine-generated text by leveraging the inherent structure of LLMs. The approach is robust and generalizable, opening up new directions for detecting machine-generated content.

RAIDAR: GENERATIVE AI DETECTION VIA REWRITING

14 Apr 2024 | Chengzhi Mao & Carl Vondrick & Hao Wang & Junfeng Yang