6 Jun 2024 | Mingjia Huo * 1 Sai Ashish Somayajula * 1 Youwei Liang 1 Ruisi Zhang 1 Farinaz Koushanfar 1 Pengtao Xie 1
The paper introduces a novel multi-objective optimization (MOO) approach for watermarking large language models (LLMs) to enhance the detectability of AI-generated texts while maintaining their semantic coherence. The method uses lightweight networks to dynamically adjust token-specific splitting ratios and watermark logits, optimizing both detection and semantic objectives. Experimental results show that the proposed method outperforms existing watermarking techniques in terms of detectability and semantic quality, demonstrating superior performance across different LLMs. The code for the method is available at <https://github.com/mignonjia/LS_watermark>. The paper also discusses the limitations of current watermarking methods and the challenges in balancing detectability and semantic integrity, highlighting the importance of adaptive adjustments to token-specific parameters. The method's effectiveness is validated through comprehensive evaluations, including comparisons with leading baselines and robustness tests against attacks such as paraphrase and copy-paste attacks. The authors conclude by emphasizing the societal implications of their work, noting its potential to prevent harmful AI applications and protect intellectual property rights.The paper introduces a novel multi-objective optimization (MOO) approach for watermarking large language models (LLMs) to enhance the detectability of AI-generated texts while maintaining their semantic coherence. The method uses lightweight networks to dynamically adjust token-specific splitting ratios and watermark logits, optimizing both detection and semantic objectives. Experimental results show that the proposed method outperforms existing watermarking techniques in terms of detectability and semantic quality, demonstrating superior performance across different LLMs. The code for the method is available at <https://github.com/mignonjia/LS_watermark>. The paper also discusses the limitations of current watermarking methods and the challenges in balancing detectability and semantic integrity, highlighting the importance of adaptive adjustments to token-specific parameters. The method's effectiveness is validated through comprehensive evaluations, including comparisons with leading baselines and robustness tests against attacks such as paraphrase and copy-paste attacks. The authors conclude by emphasizing the societal implications of their work, noting its potential to prevent harmful AI applications and protect intellectual property rights.