8 May 2024 | Rafael Rivera Soto, Kailin Koch, Aleem Khan, Barry Chen, Marcus Bishop, Nicholas Andrews
The paper "Few-Shot Detection of Machine-Generated Text using Style Representations" addresses the challenge of detecting machine-generated text, particularly from large language models (LLMs), which pose significant risks due to their ability to mimic human writing. The authors propose a novel approach that leverages style representations learned from human-authored text to distinguish between human and machine-generated content. Unlike previous methods that rely on supervised training with confirmed human and machine documents, their approach does not require access to specific LLMs during training. Instead, it uses stylistic representations to identify patterns that are consistent across different authors, including LLMs. The method is effective in distinguishing human authors from LLMs and can predict which LLM generated a given document, even with only a few examples. The paper evaluates the approach in both supervised and few-shot learning settings, demonstrating superior performance compared to existing methods. The authors also explore the robustness of their approach against paraphrasing attacks and provide a detailed experimental setup, including datasets and baselines. The findings highlight the potential of style representations in detecting machine-generated text and offer practical solutions for mitigating LLM abuses.The paper "Few-Shot Detection of Machine-Generated Text using Style Representations" addresses the challenge of detecting machine-generated text, particularly from large language models (LLMs), which pose significant risks due to their ability to mimic human writing. The authors propose a novel approach that leverages style representations learned from human-authored text to distinguish between human and machine-generated content. Unlike previous methods that rely on supervised training with confirmed human and machine documents, their approach does not require access to specific LLMs during training. Instead, it uses stylistic representations to identify patterns that are consistent across different authors, including LLMs. The method is effective in distinguishing human authors from LLMs and can predict which LLM generated a given document, even with only a few examples. The paper evaluates the approach in both supervised and few-shot learning settings, demonstrating superior performance compared to existing methods. The authors also explore the robustness of their approach against paraphrasing attacks and provide a detailed experimental setup, including datasets and baselines. The findings highlight the potential of style representations in detecting machine-generated text and offer practical solutions for mitigating LLM abuses.