[slides and audio] Turning Up the Heat%3A Min-p Sampling for Creative and Coherent LLM Outputs

The paper introduces min-$p$ sampling, a dynamic truncation sampling method designed to balance creativity and coherence in text generation from large language models (LLMs), especially at high temperatures. Current methods like top-$p$ sampling struggle to maintain coherence and creativity, particularly at higher temperatures. Min-$p$ sets a minimum base percentage threshold for tokens, which scales based on the probability of the top candidate token, ensuring that high-probability tokens are prioritized while allowing for more diverse and plausible options when the model is less confident. Experiments on benchmarks such as GPQA, GSM8K, and AlpacaEval Creative Writing demonstrate that min-$p$ improves the quality and diversity of generated text, even at high temperatures, compared to top-$p$ and other sampling methods. The method has been adopted by multiple open-source LLM implementations and independently assessed by the community, validating its practical utility and potential. Key contributions include: 1. Introducing min-$p$ sampling, a dynamic truncation method that balances quality and diversity. 2. Demonstrating min-$p$'s advantages over top-$p$ and other samplers on various benchmarks, emphasizing its reduced trade-offs between creativity and coherence at high temperatures. The paper also discusses the background of language modeling, temperature scaling, and top-$p$ sampling, and provides a detailed experimental setup and results. Future work includes evaluating min-$p$ on a wider range of NLP tasks and conducting more rigorous theoretical analysis.The paper introduces min-$p$ sampling, a dynamic truncation sampling method designed to balance creativity and coherence in text generation from large language models (LLMs), especially at high temperatures. Current methods like top-$p$ sampling struggle to maintain coherence and creativity, particularly at higher temperatures. Min-$p$ sets a minimum base percentage threshold for tokens, which scales based on the probability of the top candidate token, ensuring that high-probability tokens are prioritized while allowing for more diverse and plausible options when the model is less confident. Experiments on benchmarks such as GPQA, GSM8K, and AlpacaEval Creative Writing demonstrate that min-$p$ improves the quality and diversity of generated text, even at high temperatures, compared to top-$p$ and other sampling methods. The method has been adopted by multiple open-source LLM implementations and independently assessed by the community, validating its practical utility and potential. Key contributions include: 1. Introducing min-$p$ sampling, a dynamic truncation method that balances quality and diversity. 2. Demonstrating min-$p$'s advantages over top-$p$ and other samplers on various benchmarks, emphasizing its reduced trade-offs between creativity and coherence at high temperatures. The paper also discusses the background of language modeling, temperature scaling, and top-$p$ sampling, and provides a detailed experimental setup and results. Future work includes evaluating min-$p$ on a wider range of NLP tasks and conducting more rigorous theoretical analysis.

Min P Sampling: Balancing Creativity and Coherence at High Temperature

1 Jul 2024 | Nguyen Nhat Minh, Andrew Baker, Andreas Kirsch, Clement Neo