July 17, 2024 | Aske Plaat, Annie Wong, Suzan Verberne, Joost Broekens, Niki van Stein, Thomas Bäck
This survey explores reasoning with large language models (LLMs), focusing on prompt-based methods for multi-step reasoning. LLMs, trained on large datasets, have achieved breakthroughs in tasks like translation, summarization, and question-answering. Recent advances in chain-of-thought prompt learning have enabled LLMs to perform complex reasoning, such as solving grade school math word problems. The paper reviews the rapidly expanding field of prompt-based reasoning, identifying different ways to generate, evaluate, and control multi-step reasoning. It provides an in-depth coverage of core approaches and open problems, and proposes a research agenda for the near future. The paper highlights the relationship between reasoning and prompt-based learning, and discusses the relationship between reasoning, sequential decision processes, and reinforcement learning. It finds that self-improvement, self-reflection, and some metacognitive abilities of the reasoning process are possible through the judicious use of prompts. True self-improvement and self-reasoning, to go from reasoning with LLMs to reasoning by LLMs, remains future work.
The paper discusses the training pipeline of LLMs, including data preparation, pretraining, fine-tuning, instruction tuning, preference alignment, optimization, and inference. In-context learning, also known as prompt-based learning, is a form of few-shot learning that occurs in LLMs with hundreds of billions of parameters. It allows models to provide correct answers without changing parameters. The paper discusses benchmarks used to evaluate reasoning performance, including GSM8K, ASDiv, MAWPS, SVAMP, and AQuA. It reviews papers that use these benchmarks and discusses approaches for generating, evaluating, and controlling reasoning steps. The paper identifies three main approaches for step generation: hand-written prompts, prompts using external knowledge, and model-generated prompts. It also discusses approaches for step evaluation, including self-assessment, tool-based validation, and external model validation. The paper discusses approaches for step control, including greedy selection, ensemble strategies, and reinforcement learning. It highlights the relationship between reasoning and other fields, such as self-reflection, metacognition, and artificial general intelligence. The paper concludes with a research agenda for future work in the field of reasoning with LLMs.This survey explores reasoning with large language models (LLMs), focusing on prompt-based methods for multi-step reasoning. LLMs, trained on large datasets, have achieved breakthroughs in tasks like translation, summarization, and question-answering. Recent advances in chain-of-thought prompt learning have enabled LLMs to perform complex reasoning, such as solving grade school math word problems. The paper reviews the rapidly expanding field of prompt-based reasoning, identifying different ways to generate, evaluate, and control multi-step reasoning. It provides an in-depth coverage of core approaches and open problems, and proposes a research agenda for the near future. The paper highlights the relationship between reasoning and prompt-based learning, and discusses the relationship between reasoning, sequential decision processes, and reinforcement learning. It finds that self-improvement, self-reflection, and some metacognitive abilities of the reasoning process are possible through the judicious use of prompts. True self-improvement and self-reasoning, to go from reasoning with LLMs to reasoning by LLMs, remains future work.
The paper discusses the training pipeline of LLMs, including data preparation, pretraining, fine-tuning, instruction tuning, preference alignment, optimization, and inference. In-context learning, also known as prompt-based learning, is a form of few-shot learning that occurs in LLMs with hundreds of billions of parameters. It allows models to provide correct answers without changing parameters. The paper discusses benchmarks used to evaluate reasoning performance, including GSM8K, ASDiv, MAWPS, SVAMP, and AQuA. It reviews papers that use these benchmarks and discusses approaches for generating, evaluating, and controlling reasoning steps. The paper identifies three main approaches for step generation: hand-written prompts, prompts using external knowledge, and model-generated prompts. It also discusses approaches for step evaluation, including self-assessment, tool-based validation, and external model validation. The paper discusses approaches for step control, including greedy selection, ensemble strategies, and reinforcement learning. It highlights the relationship between reasoning and other fields, such as self-reflection, metacognition, and artificial general intelligence. The paper concludes with a research agenda for future work in the field of reasoning with LLMs.