This paper introduces a new decoding method called Permute-and-Flip (PF) for large language models (LLMs). PF decoder is designed to balance the tradeoff between perplexity and robustness, achieving up to 2x better quality-robustness tradeoff compared to standard sampling methods. It also includes a cryptographic watermarking scheme tailored for PF decoder, which allows for low false positive rates and high recall when generating text with high entropy. The PF decoder significantly outperforms naive sampling in terms of perplexity while retaining robustness and detectability. The paper provides a detailed analysis of PF decoder's properties, including its robustness, diversity, and watermarking capabilities. Experimental results on various datasets demonstrate the effectiveness of PF decoder and its watermarking scheme, showing that they achieve the best balance between detection accuracy and perplexity. The code for PF decoder is available at <https://github.com/XuandongZhao/pf-decoding>.This paper introduces a new decoding method called Permute-and-Flip (PF) for large language models (LLMs). PF decoder is designed to balance the tradeoff between perplexity and robustness, achieving up to 2x better quality-robustness tradeoff compared to standard sampling methods. It also includes a cryptographic watermarking scheme tailored for PF decoder, which allows for low false positive rates and high recall when generating text with high entropy. The PF decoder significantly outperforms naive sampling in terms of perplexity while retaining robustness and detectability. The paper provides a detailed analysis of PF decoder's properties, including its robustness, diversity, and watermarking capabilities. Experimental results on various datasets demonstrate the effectiveness of PF decoder and its watermarking scheme, showing that they achieve the best balance between detection accuracy and perplexity. The code for PF decoder is available at <https://github.com/XuandongZhao/pf-decoding>.