9 Jun 2024 | Yijian LU, Aiwei Liu, Dianzhi Yu, Jingjing Li, Irwin King
This paper proposes an entropy-based text watermarking detection method (EWD) to improve the detection performance of low-entropy texts generated by large language models (LLMs). Current text watermarking algorithms struggle with low-entropy scenarios, where tokens have limited variability, making it difficult to detect watermarks. EWD addresses this by assigning higher influence weights to high-entropy tokens during detection, allowing the algorithm to better reflect the watermarking level. The method is training-free and fully automated, and it can be applied to texts with varying entropy distributions.
The paper discusses the limitations of existing methods, such as SWEET, which requires manual entropy thresholds and does not account for token entropy distribution. EWD overcomes these issues by using a monotonically-increasing function to generate detection weights based on token entropy. This ensures that tokens with higher entropy have greater influence on the detection result, improving accuracy in low-entropy scenarios.
Experiments show that EWD outperforms existing methods in detecting low-entropy texts, achieving higher detection accuracy and maintaining similar performance in high-entropy scenarios. The method is also robust against back-translation watermark-removal attacks. Theoretical analysis confirms that EWD has lower Type-II error rates compared to other methods, indicating better detection accuracy.
The proposed method is evaluated on various datasets, including code generation and news report generation. Results show that EWD achieves better detection performance than existing baselines, particularly in low-entropy scenarios. The method is also efficient, with detection speed comparable to other methods. Overall, EWD provides a more accurate and robust solution for detecting watermarked texts generated by LLMs.This paper proposes an entropy-based text watermarking detection method (EWD) to improve the detection performance of low-entropy texts generated by large language models (LLMs). Current text watermarking algorithms struggle with low-entropy scenarios, where tokens have limited variability, making it difficult to detect watermarks. EWD addresses this by assigning higher influence weights to high-entropy tokens during detection, allowing the algorithm to better reflect the watermarking level. The method is training-free and fully automated, and it can be applied to texts with varying entropy distributions.
The paper discusses the limitations of existing methods, such as SWEET, which requires manual entropy thresholds and does not account for token entropy distribution. EWD overcomes these issues by using a monotonically-increasing function to generate detection weights based on token entropy. This ensures that tokens with higher entropy have greater influence on the detection result, improving accuracy in low-entropy scenarios.
Experiments show that EWD outperforms existing methods in detecting low-entropy texts, achieving higher detection accuracy and maintaining similar performance in high-entropy scenarios. The method is also robust against back-translation watermark-removal attacks. Theoretical analysis confirms that EWD has lower Type-II error rates compared to other methods, indicating better detection accuracy.
The proposed method is evaluated on various datasets, including code generation and news report generation. Results show that EWD achieves better detection performance than existing baselines, particularly in low-entropy scenarios. The method is also efficient, with detection speed comparable to other methods. Overall, EWD provides a more accurate and robust solution for detecting watermarked texts generated by LLMs.