SELF-CONSISTENCY IMPROVES CHAIN OF THOUGHT REASONING IN LANGUAGE MODELS

SELF-CONSISTENCY IMPROVES CHAIN OF THOUGHT REASONING IN LANGUAGE MODELS

7 Mar 2023 | Xuezhi Wang†‡ Jason Wei† Dale Schuurmans† Quoc Le† Ed H. Chi† Sharan Narang† Aakanksha Chowdhery† Denny Zhou‡§
Self-consistency improves chain-of-thought reasoning in language models by introducing a decoding strategy that samples multiple reasoning paths and selects the most consistent answer. This approach leverages the intuition that complex reasoning tasks often have multiple paths leading to the correct answer. The method involves generating diverse reasoning paths, then marginalizing out the paths to find the most consistent answer. Self-consistency outperforms greedy decoding and other methods on various arithmetic and commonsense reasoning benchmarks, achieving significant accuracy improvements. It is simple, unsupervised, and works with pre-trained language models without additional training or annotations. Experiments show that self-consistency improves performance across different model scales and tasks, including GSM8K, SVAMP, AQuA, StrategyQA, and ARC-challenge. It also performs well on tasks where chain-of-thought prompting may hurt performance. Self-consistency is robust to sampling strategies and works with imperfect prompts and non-natural language reasoning paths. It provides uncertainty estimates and improves model calibration. The method is compared to other approaches like sample-and-rank, beam search, and ensemble-based methods, showing superior performance. Self-consistency is effective for reasoning tasks and provides reliable rationales, making it a valuable improvement for language models.Self-consistency improves chain-of-thought reasoning in language models by introducing a decoding strategy that samples multiple reasoning paths and selects the most consistent answer. This approach leverages the intuition that complex reasoning tasks often have multiple paths leading to the correct answer. The method involves generating diverse reasoning paths, then marginalizing out the paths to find the most consistent answer. Self-consistency outperforms greedy decoding and other methods on various arithmetic and commonsense reasoning benchmarks, achieving significant accuracy improvements. It is simple, unsupervised, and works with pre-trained language models without additional training or annotations. Experiments show that self-consistency improves performance across different model scales and tasks, including GSM8K, SVAMP, AQuA, StrategyQA, and ARC-challenge. It also performs well on tasks where chain-of-thought prompting may hurt performance. Self-consistency is robust to sampling strategies and works with imperfect prompts and non-natural language reasoning paths. It provides uncertainty estimates and improves model calibration. The method is compared to other approaches like sample-and-rank, beam search, and ensemble-based methods, showing superior performance. Self-consistency is effective for reasoning tasks and provides reliable rationales, making it a valuable improvement for language models.
Reach us at info@study.space
[slides and audio] Self-Consistency Improves Chain of Thought Reasoning in Language Models