16 Apr 2023 | Denny Zhou†*, Nathanael Schärli†, Le Hou†, Jason Wei†, Nathan Scales†, Xuezhi Wang† Dale Schuurmans†, Claire Cui†, Olivier Bousquet†, Quoc Le†, Ed Chi†
Least-to-most prompting enables large language models to solve complex reasoning tasks by breaking down problems into simpler subproblems and solving them sequentially. This approach outperforms chain-of-thought prompting, especially on tasks requiring generalization to harder problems. The method involves two stages: decomposition, where a complex problem is broken into subproblems, and subproblem solving, where each subproblem is addressed using the answers from previous steps. Experiments on symbolic manipulation, compositional generalization, and math reasoning show that least-to-most prompting achieves higher accuracy, particularly with the GPT-3 code-davinci-002 model, which can solve the SCAN benchmark with 99% accuracy using only 14 examples, compared to 16% with chain-of-thought prompting. The approach is effective in tasks like last-letter-concatenation, where it generalizes well to longer lists, and in SCAN, where it solves complex commands with minimal examples. Least-to-most prompting can be combined with other techniques but does not require training or fine-tuning. It is particularly useful for tasks requiring multi-step reasoning, such as math problems and compositional generalization. The method demonstrates strong performance across various benchmarks, showing its potential for improving reasoning capabilities in large language models.Least-to-most prompting enables large language models to solve complex reasoning tasks by breaking down problems into simpler subproblems and solving them sequentially. This approach outperforms chain-of-thought prompting, especially on tasks requiring generalization to harder problems. The method involves two stages: decomposition, where a complex problem is broken into subproblems, and subproblem solving, where each subproblem is addressed using the answers from previous steps. Experiments on symbolic manipulation, compositional generalization, and math reasoning show that least-to-most prompting achieves higher accuracy, particularly with the GPT-3 code-davinci-002 model, which can solve the SCAN benchmark with 99% accuracy using only 14 examples, compared to 16% with chain-of-thought prompting. The approach is effective in tasks like last-letter-concatenation, where it generalizes well to longer lists, and in SCAN, where it solves complex commands with minimal examples. Least-to-most prompting can be combined with other techniques but does not require training or fine-tuning. It is particularly useful for tasks requiring multi-step reasoning, such as math problems and compositional generalization. The method demonstrates strong performance across various benchmarks, showing its potential for improving reasoning capabilities in large language models.