2024 | Rafael Rivera Soto, Kailin Koch, Aleem Khan, Barry Chen, Marcus Bishop, Nicholas Andrews
This paper presents a few-shot detection method for identifying machine-generated text using style representations derived from human-authored text. The approach leverages stylistic features that distinguish human from machine authors, including state-of-the-art large language models like Llama-2, ChatGPT, and GPT-4. Unlike previous methods that rely on training with machine-generated text, this method uses representations of writing style estimated from human-authored text, which are effective in distinguishing between human and machine authors. The method is able to predict which specific language model generated a given document, even with only a few examples from each model. The approach is robust to distribution shifts and does not require access to the predictive distribution of the unseen LLM. The method is evaluated on a variety of datasets, including those generated by multiple LLMs in different domains. The results show that the proposed method outperforms existing few-shot learning methods and standard zero-shot baselines. The method is also effective against paraphrasing attacks, as it can detect text generated by a paraphrased version of the original model. The paper concludes that the proposed approach provides a practical and effective way to detect machine-generated text, particularly in scenarios where the text is generated by a specific LLM. The method is based on the idea that writing style is a consistent feature across different prompts and that stylistic representations can be learned from large corpora of human-authored text. The approach is able to generalize to new LLMs, topics, and domains, making it a promising solution for detecting machine-generated text in various applications.This paper presents a few-shot detection method for identifying machine-generated text using style representations derived from human-authored text. The approach leverages stylistic features that distinguish human from machine authors, including state-of-the-art large language models like Llama-2, ChatGPT, and GPT-4. Unlike previous methods that rely on training with machine-generated text, this method uses representations of writing style estimated from human-authored text, which are effective in distinguishing between human and machine authors. The method is able to predict which specific language model generated a given document, even with only a few examples from each model. The approach is robust to distribution shifts and does not require access to the predictive distribution of the unseen LLM. The method is evaluated on a variety of datasets, including those generated by multiple LLMs in different domains. The results show that the proposed method outperforms existing few-shot learning methods and standard zero-shot baselines. The method is also effective against paraphrasing attacks, as it can detect text generated by a paraphrased version of the original model. The paper concludes that the proposed approach provides a practical and effective way to detect machine-generated text, particularly in scenarios where the text is generated by a specific LLM. The method is based on the idea that writing style is a consistent feature across different prompts and that stylistic representations can be learned from large corpora of human-authored text. The approach is able to generalize to new LLMs, topics, and domains, making it a promising solution for detecting machine-generated text in various applications.