8 Jun 2024 | Abe Bohan Hou*, Jingyu Zhang*, Yichen Wang*, Daniel Khashabi*, Tianxing He*
**k-SEMSTAMP: A Clustering-Based Semantic Watermark for Detection of Machine-Generated Text**
**Abstract:**
Recent watermarked generation algorithms inject detectable signatures during language generation to facilitate post-hoc detection. While token-level watermarks are vulnerable to paraphrase attacks, SEMSTAMP (Hou et al., 2023) applies watermarks on the semantic representation of sentences, demonstrating promising robustness. However, SEMSTAMP uses locality-sensitive hashing (LSH) to partition the semantic space, which may lead to a suboptimal trade-off between robustness and speed. This paper proposes k-SEMSTAMP, an enhancement of SEMSTAMP that utilizes k-means clustering to partition the embedding space, improving robustness and sampling efficiency while preserving generation quality.
**Introduction:**
To detect machine-generated text, recent watermarked generation algorithms inject detectable signatures. Token-level watermarks are vulnerable to paraphrase attacks, leading to the development of paraphrase-robust sentence-level watermarks like SEMSTAMP. However, SEMSTAMP's random hyperplane partitioning can split semantically similar sentences into different regions, reducing watermark strength. k-SEMSTAMP addresses this by using k-means clustering to partition the semantic space based on the inherent semantic structure of the text domain.
**Approach:**
k-SEMSTAMP partitions the semantic space using k-means clustering, which is more effective than LSH in preserving semantically similar sentences. The method involves initializing k-SEMSTAMP with a large dataset from the target text domain, clustering sentence embeddings, and partitioning the semantic space into valid and blocked regions. k-SEMSTAMP uses rejection sampling to accept only sentences whose embeddings fall into valid regions, ensuring robustness against paraphrase attacks.
**Experiments:**
k-SEMSTAMP is evaluated on the RealNews and BookSum datasets using various paraphrase generators. Results show that k-SEMSTAMP is more robust to paraphrase attacks compared to SEMSTAMP and other baselines, with higher detection robustness and sampling efficiency. k-SEMSTAMP also maintains generation quality, as measured by perplexity, text diversity, and semantic diversity.
**Conclusion:**
k-SEMSTAMP is a simple yet effective enhancement of SEMSTAMP, improving paraphrastic robustness and sampling speed. However, it requires specifying the text domain for initialization, which can affect performance in certain scenarios. Ethical considerations and future research directions are discussed, emphasizing the need for further development in semantic watermarking and adversarial robust methods for AI governance.**k-SEMSTAMP: A Clustering-Based Semantic Watermark for Detection of Machine-Generated Text**
**Abstract:**
Recent watermarked generation algorithms inject detectable signatures during language generation to facilitate post-hoc detection. While token-level watermarks are vulnerable to paraphrase attacks, SEMSTAMP (Hou et al., 2023) applies watermarks on the semantic representation of sentences, demonstrating promising robustness. However, SEMSTAMP uses locality-sensitive hashing (LSH) to partition the semantic space, which may lead to a suboptimal trade-off between robustness and speed. This paper proposes k-SEMSTAMP, an enhancement of SEMSTAMP that utilizes k-means clustering to partition the embedding space, improving robustness and sampling efficiency while preserving generation quality.
**Introduction:**
To detect machine-generated text, recent watermarked generation algorithms inject detectable signatures. Token-level watermarks are vulnerable to paraphrase attacks, leading to the development of paraphrase-robust sentence-level watermarks like SEMSTAMP. However, SEMSTAMP's random hyperplane partitioning can split semantically similar sentences into different regions, reducing watermark strength. k-SEMSTAMP addresses this by using k-means clustering to partition the semantic space based on the inherent semantic structure of the text domain.
**Approach:**
k-SEMSTAMP partitions the semantic space using k-means clustering, which is more effective than LSH in preserving semantically similar sentences. The method involves initializing k-SEMSTAMP with a large dataset from the target text domain, clustering sentence embeddings, and partitioning the semantic space into valid and blocked regions. k-SEMSTAMP uses rejection sampling to accept only sentences whose embeddings fall into valid regions, ensuring robustness against paraphrase attacks.
**Experiments:**
k-SEMSTAMP is evaluated on the RealNews and BookSum datasets using various paraphrase generators. Results show that k-SEMSTAMP is more robust to paraphrase attacks compared to SEMSTAMP and other baselines, with higher detection robustness and sampling efficiency. k-SEMSTAMP also maintains generation quality, as measured by perplexity, text diversity, and semantic diversity.
**Conclusion:**
k-SEMSTAMP is a simple yet effective enhancement of SEMSTAMP, improving paraphrastic robustness and sampling speed. However, it requires specifying the text domain for initialization, which can affect performance in certain scenarios. Ethical considerations and future research directions are discussed, emphasizing the need for further development in semantic watermarking and adversarial robust methods for AI governance.