10 Jan 2023 | Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, Denny Zhou
This paper explores the effectiveness of *chain-of-thought prompting* in enhancing the reasoning abilities of large language models. Chain-of-thought prompting involves providing a series of intermediate reasoning steps (chain of thought) alongside the final answer in few-shot learning scenarios. The authors demonstrate that this method significantly improves performance on various reasoning tasks, including arithmetic, commonsense, and symbolic reasoning. Experiments with several large language models, such as PaLM 540B, show that chain-of-thought prompting can achieve state-of-the-art accuracy on benchmarks like GSM8K, surpassing even finetuned GPT-3 with a verifier. The approach is robust to different annotators, exemplars, and language models, and it facilitates generalization to longer sequences in symbolic reasoning tasks. The paper also discusses limitations, such as the need for factual reasoning and the cost of manual annotation, and suggests directions for future research.This paper explores the effectiveness of *chain-of-thought prompting* in enhancing the reasoning abilities of large language models. Chain-of-thought prompting involves providing a series of intermediate reasoning steps (chain of thought) alongside the final answer in few-shot learning scenarios. The authors demonstrate that this method significantly improves performance on various reasoning tasks, including arithmetic, commonsense, and symbolic reasoning. Experiments with several large language models, such as PaLM 540B, show that chain-of-thought prompting can achieve state-of-the-art accuracy on benchmarks like GSM8K, surpassing even finetuned GPT-3 with a verifier. The approach is robust to different annotators, exemplars, and language models, and it facilitates generalization to longer sequences in symbolic reasoning tasks. The paper also discusses limitations, such as the need for factual reasoning and the cost of manual annotation, and suggests directions for future research.