A Thorough Examination of Decoding Methods in the Era of LLMs

A Thorough Examination of Decoding Methods in the Era of LLMs

17 Jun 2024 | Chufan Shi, Haoran Yang, Deng Cai, Zhisong Zhang, Yifan Wang, Yujiu Yang, Wai Lam
This paper provides a comprehensive analysis of decoding methods for large language models (LLMs), evaluating their performance, robustness, and speed across various tasks, models, and deployment environments. The study reveals that decoding method performance is highly task-dependent and influenced by factors such as alignment, model size, and quantization. Deterministic methods generally perform better on closed-ended tasks, while stochastic methods are more effective for open-ended tasks. However, the choice of decoding method is not straightforward, as some methods require extensive hyperparameter tuning to achieve optimal results. The paper evaluates several decoding methods, including deterministic methods like greedy search, beam search, diverse beam search, contrastive search, and frustratingly simple decoding (FSD), as well as stochastic methods such as temperature sampling, top-p sampling, top-k sampling, and others. The results show that deterministic methods often outperform stochastic ones in closed-ended tasks, while stochastic methods are better suited for open-ended tasks. However, the performance of stochastic methods can be improved by using self-consistency strategies, which involve generating multiple sequences and selecting the most common output. The study also highlights the sensitivity of decoding methods to hyperparameters and the impact of model size and quantization on performance. For example, larger models and higher quantization levels can improve performance but may also reduce the effectiveness of certain decoding methods. The paper concludes that the choice of decoding method depends on the specific task and model, and that deterministic methods are more reliable for tasks requiring high factual accuracy and precise instruction following. The findings suggest that further research is needed to develop more robust and efficient decoding methods for LLMs.This paper provides a comprehensive analysis of decoding methods for large language models (LLMs), evaluating their performance, robustness, and speed across various tasks, models, and deployment environments. The study reveals that decoding method performance is highly task-dependent and influenced by factors such as alignment, model size, and quantization. Deterministic methods generally perform better on closed-ended tasks, while stochastic methods are more effective for open-ended tasks. However, the choice of decoding method is not straightforward, as some methods require extensive hyperparameter tuning to achieve optimal results. The paper evaluates several decoding methods, including deterministic methods like greedy search, beam search, diverse beam search, contrastive search, and frustratingly simple decoding (FSD), as well as stochastic methods such as temperature sampling, top-p sampling, top-k sampling, and others. The results show that deterministic methods often outperform stochastic ones in closed-ended tasks, while stochastic methods are better suited for open-ended tasks. However, the performance of stochastic methods can be improved by using self-consistency strategies, which involve generating multiple sequences and selecting the most common output. The study also highlights the sensitivity of decoding methods to hyperparameters and the impact of model size and quantization on performance. For example, larger models and higher quantization levels can improve performance but may also reduce the effectiveness of certain decoding methods. The paper concludes that the choice of decoding method depends on the specific task and model, and that deterministic methods are more reliable for tasks requiring high factual accuracy and precise instruction following. The findings suggest that further research is needed to develop more robust and efficient decoding methods for LLMs.
Reach us at info@study.space
[slides] A Thorough Examination of Decoding Methods in the Era of LLMs | StudySpace