Multi-Candidate Speculative Decoding

Multi-Candidate Speculative Decoding

2024-01-12 | Sen Yang, Shujian Huang*, Xinyu Dai, Jiajun Chen
This paper introduces a method called Multi-Candidate Speculative Decoding (MCSD) to improve the efficiency of large language models (LLMs) by generating multiple candidate segments from a draft model and verifying them in parallel with the target model. The main challenge is to maintain the distributional consistency between the draft and target models while improving the acceptance rate of candidate tokens. The authors propose algorithms for efficient multi-candidate verification while preserving the target model's output distribution. Their approach significantly enhances acceptance rates on multiple datasets and models, outperforming standard speculative decoding (SD). The method is evaluated using the LLaMA suite, including its fine-tuned version Vicuna, and is shown to be effective across different models and datasets. The paper also discusses the impact of dataset, fine-tuning, and sampling methods on acceptance rates, and provides insights into the performance variations under different budget configurations.This paper introduces a method called Multi-Candidate Speculative Decoding (MCSD) to improve the efficiency of large language models (LLMs) by generating multiple candidate segments from a draft model and verifying them in parallel with the target model. The main challenge is to maintain the distributional consistency between the draft and target models while improving the acceptance rate of candidate tokens. The authors propose algorithms for efficient multi-candidate verification while preserving the target model's output distribution. Their approach significantly enhances acceptance rates on multiple datasets and models, outperforming standard speculative decoding (SD). The method is evaluated using the LLaMA suite, including its fine-tuned version Vicuna, and is shown to be effective across different models and datasets. The paper also discusses the impact of dataset, fine-tuning, and sampling methods on acceptance rates, and provides insights into the performance variations under different budget configurations.
Reach us at info@study.space
Understanding Multi-Candidate Speculative Decoding