ESCAPE SKY-HIGH COST: EARLY-STOPPING SELF-CONSISTENCY FOR MULTI-STEP REASONING

ESCAPE SKY-HIGH COST: EARLY-STOPPING SELF-CONSISTENCY FOR MULTI-STEP REASONING

2024 | Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Xinglin Wang, Bin Sun, Heda Wang, Kan Li
This paper proposes Early-Stopping Self-Consistency (ESC), a sampling strategy that significantly reduces the cost of self-consistency (SC) for multi-step reasoning tasks without sacrificing performance. SC, a widely used decoding strategy for chain-of-thought reasoning, requires multiple sampling with a preset size, leading to high computational costs. ESC introduces an early-stopping mechanism that stops sampling when answers within a window are consistent, reducing the number of samples needed while maintaining performance. The method is scalable and model-agnostic, requiring no human annotation or additional training. The paper evaluates ESC on three categories of reasoning tasks: arithmetic, commonsense, and symbolic reasoning, across three language models (GPT-4, GPT-3.5-Turbo, and Llama-2 7b). Results show that ESC reduces the average number of samples needed for chain-of-thought reasoning by significant margins on six benchmarks, including MATH (-33.8%), GSM8K (-80.1%), StrategyQA (-76.8%), CommonsenseQA (-78.5%), Coin Flip (-84.2%), and Last Letters (-67.4%), while maintaining comparable performance. A control scheme for ESC is derived to dynamically balance performance and cost based on task and model requirements. The scheme uses the first observation window to estimate sampling cost and performance, allowing for adaptive adjustments. Theoretical analysis shows that ESC is highly likely to maintain performance due to its early-stopping mechanism, which reduces sampling overhead while minimizing performance impact. ESC is also shown to be robust to different decoding settings and prompts, and effective for open-ended generation tasks. The method is scalable, with performance improving as sampling size increases, and is robust to variations in sampling parameters and prompts. The results demonstrate that ESC can significantly reduce costs while maintaining performance, making it a practical solution for multi-step reasoning tasks.This paper proposes Early-Stopping Self-Consistency (ESC), a sampling strategy that significantly reduces the cost of self-consistency (SC) for multi-step reasoning tasks without sacrificing performance. SC, a widely used decoding strategy for chain-of-thought reasoning, requires multiple sampling with a preset size, leading to high computational costs. ESC introduces an early-stopping mechanism that stops sampling when answers within a window are consistent, reducing the number of samples needed while maintaining performance. The method is scalable and model-agnostic, requiring no human annotation or additional training. The paper evaluates ESC on three categories of reasoning tasks: arithmetic, commonsense, and symbolic reasoning, across three language models (GPT-4, GPT-3.5-Turbo, and Llama-2 7b). Results show that ESC reduces the average number of samples needed for chain-of-thought reasoning by significant margins on six benchmarks, including MATH (-33.8%), GSM8K (-80.1%), StrategyQA (-76.8%), CommonsenseQA (-78.5%), Coin Flip (-84.2%), and Last Letters (-67.4%), while maintaining comparable performance. A control scheme for ESC is derived to dynamically balance performance and cost based on task and model requirements. The scheme uses the first observation window to estimate sampling cost and performance, allowing for adaptive adjustments. Theoretical analysis shows that ESC is highly likely to maintain performance due to its early-stopping mechanism, which reduces sampling overhead while minimizing performance impact. ESC is also shown to be robust to different decoding settings and prompts, and effective for open-ended generation tasks. The method is scalable, with performance improving as sampling size increases, and is robust to variations in sampling parameters and prompts. The results demonstrate that ESC can significantly reduce costs while maintaining performance, making it a practical solution for multi-step reasoning tasks.
Reach us at info@study.space
[slides and audio] Escape Sky-high Cost%3A Early-stopping Self-Consistency for Multi-step Reasoning