19 Jan 2024 | Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Xinglin Wang, Bin Sun, Heda Wang, Kan Li
The paper introduces Early-Stopping Self-Consistency (ESC), a novel sampling process designed to reduce the computational cost of self-consistency (SC) while maintaining or improving performance. SC is a widely used strategy for chain-of-thought reasoning, which involves generating multiple paths to the correct answer and selecting the most consistent one. However, SC is computationally expensive, especially for large language models (LLMs).
ESC achieves this by stopping the sampling process early when the answers within a consecutive set of samples (referred to as a "window") are all the same, indicating high confidence in the correct answer. This approach significantly reduces the number of samples needed without sacrificing performance. The paper also proposes a control scheme to dynamically adjust the window size and maximum sampling size based on the task and model, ensuring efficient performance-cost trade-offs.
Experiments on various reasoning tasks, including arithmetic, commonsense, and symbolic reasoning, demonstrate that ESC reduces the average number of samples by a significant margin while maintaining or improving performance. The method is evaluated on datasets such as MATH, GSM8K, StrategyQA, CommonsenseQA, Coin Flip, and Last Letters, showing substantial cost savings. Additionally, the control scheme for ESC is shown to be effective in balancing performance and cost across different tasks and models.The paper introduces Early-Stopping Self-Consistency (ESC), a novel sampling process designed to reduce the computational cost of self-consistency (SC) while maintaining or improving performance. SC is a widely used strategy for chain-of-thought reasoning, which involves generating multiple paths to the correct answer and selecting the most consistent one. However, SC is computationally expensive, especially for large language models (LLMs).
ESC achieves this by stopping the sampling process early when the answers within a consecutive set of samples (referred to as a "window") are all the same, indicating high confidence in the correct answer. This approach significantly reduces the number of samples needed without sacrificing performance. The paper also proposes a control scheme to dynamically adjust the window size and maximum sampling size based on the task and model, ensuring efficient performance-cost trade-offs.
Experiments on various reasoning tasks, including arithmetic, commonsense, and symbolic reasoning, demonstrate that ESC reduces the average number of samples by a significant margin while maintaining or improving performance. The method is evaluated on datasets such as MATH, GSM8K, StrategyQA, CommonsenseQA, Coin Flip, and Last Letters, showing substantial cost savings. Additionally, the control scheme for ESC is shown to be effective in balancing performance and cost across different tasks and models.