[slides and audio] Adaptive Text Watermark for Large Language Models

This paper proposes an adaptive text watermarking strategy for large language models (LLMs) to address the challenge of generating high-quality watermarked text while maintaining robustness, security, and the ability to detect watermarks without prior knowledge of the prompt and model. The method adaptively adds watermarks to high-entropy token distributions, leaving low-entropy distributions untouched. It also uses a semantic-based logits scaling vector extraction to enhance security and minimize the impact on text quality. The adaptive watermark temperature scaling method further perturbs the distribution by scaling the temperature, reducing the influence on text quality. Experiments show that the method achieves comparable robustness to existing watermark methods and maintains sufficient security while keeping text quality similar to un-watermarked text. The method is agnostic to the original LLM and prompts, making it effective for detection. The approach is evaluated on various LLMs, demonstrating strong performance in terms of robustness, security, and text quality. The results show that the method is robust against paraphrase attacks and has a low decryption rate in spoofing attacks, indicating strong security. The method also maintains low text perplexity, indicating high text quality. The paper concludes that the proposed method provides a holistic solution for LLM watermarking, achieving strong robustness, high security, and high text quality.This paper proposes an adaptive text watermarking strategy for large language models (LLMs) to address the challenge of generating high-quality watermarked text while maintaining robustness, security, and the ability to detect watermarks without prior knowledge of the prompt and model. The method adaptively adds watermarks to high-entropy token distributions, leaving low-entropy distributions untouched. It also uses a semantic-based logits scaling vector extraction to enhance security and minimize the impact on text quality. The adaptive watermark temperature scaling method further perturbs the distribution by scaling the temperature, reducing the influence on text quality. Experiments show that the method achieves comparable robustness to existing watermark methods and maintains sufficient security while keeping text quality similar to un-watermarked text. The method is agnostic to the original LLM and prompts, making it effective for detection. The approach is evaluated on various LLMs, demonstrating strong performance in terms of robustness, security, and text quality. The results show that the method is robust against paraphrase attacks and has a low decryption rate in spoofing attacks, indicating strong security. The method also maintains low text perplexity, indicating high text quality. The paper concludes that the proposed method provides a holistic solution for LLM watermarking, achieving strong robustness, high security, and high text quality.

Adaptive Text Watermark for Large Language Models

2024 | Yepeng Liu, Yuheng Bu