23 October 2024 | Sumanth Dathathri, Abigail See, Sumedh Ghaisas, Po-Sen Huang, Rob McAdam, Johannes Welbl, Vandana Bachani, Alex Kaskasoli, Robert Stanforth, Tatiana Matejovicova, Jamie Hayes, Nidhi Vyas, Majd Al Merey, Jonah Brown-Cohen, Rudy Bunel, Borja Balle, Taylan Cemgil, Zahra Ahmed, Kitty Stacpoole, Ilia Shumailov, Ciprian Baetu, Sven Gowal, Demis Hassabis & Pushmeet Kohli
The article introduces SynthID-Text, a production-ready text watermarking scheme designed to identify synthetic text generated by large language models (LLMs). SynthID-Text aims to preserve text quality while enabling high detection accuracy with minimal computational overhead. The scheme uses a novel sampling algorithm called Tournament sampling, which integrates with speculative sampling to enhance efficiency in large-scale production systems. Evaluations across multiple LLMs show that SynthID-Text provides improved detectability compared to existing methods, and standard benchmarks and human evaluations indicate no change in LLM capabilities. A live experiment with nearly 20 million Gemini® responses confirms the preservation of text quality. The availability of SynthID-Text is expected to facilitate further development and responsible use of LLM systems.The article introduces SynthID-Text, a production-ready text watermarking scheme designed to identify synthetic text generated by large language models (LLMs). SynthID-Text aims to preserve text quality while enabling high detection accuracy with minimal computational overhead. The scheme uses a novel sampling algorithm called Tournament sampling, which integrates with speculative sampling to enhance efficiency in large-scale production systems. Evaluations across multiple LLMs show that SynthID-Text provides improved detectability compared to existing methods, and standard benchmarks and human evaluations indicate no change in LLM capabilities. A live experiment with nearly 20 million Gemini® responses confirms the preservation of text quality. The availability of SynthID-Text is expected to facilitate further development and responsible use of LLM systems.