[slides and audio] Scalable watermarking for identifying large language model outputs

The article introduces SynthID-Text, a production-ready text watermarking scheme designed to identify synthetic text generated by large language models (LLMs). SynthID-Text aims to preserve text quality while enabling high detection accuracy with minimal computational overhead. The scheme uses a novel sampling algorithm called Tournament sampling, which integrates with speculative sampling to enhance efficiency in large-scale production systems. Evaluations across multiple LLMs show that SynthID-Text provides improved detectability compared to existing methods, and standard benchmarks and human evaluations indicate no change in LLM capabilities. A live experiment with nearly 20 million Gemini® responses confirms the preservation of text quality. The availability of SynthID-Text is expected to facilitate further development and responsible use of LLM systems.The article introduces SynthID-Text, a production-ready text watermarking scheme designed to identify synthetic text generated by large language models (LLMs). SynthID-Text aims to preserve text quality while enabling high detection accuracy with minimal computational overhead. The scheme uses a novel sampling algorithm called Tournament sampling, which integrates with speculative sampling to enhance efficiency in large-scale production systems. Evaluations across multiple LLMs show that SynthID-Text provides improved detectability compared to existing methods, and standard benchmarks and human evaluations indicate no change in LLM capabilities. A live experiment with nearly 20 million Gemini® responses confirms the preservation of text quality. The availability of SynthID-Text is expected to facilitate further development and responsible use of LLM systems.