From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step

From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step

23 May 2024 | Yuntian Deng, Yejin Choi, Stuart Shieber
This paper introduces Stepwise Internalization, a method for enabling implicit chain-of-thought (CoT) reasoning in language models by gradually removing intermediate reasoning steps during training. The approach begins with a model trained for explicit CoT reasoning and then gradually removes the intermediate steps, allowing the model to internalize the reasoning process. This method enables a GPT-2 Small model to solve 9-by-9 multiplication with up to 99% accuracy, while standard training cannot solve beyond 4-by-4 multiplication. Furthermore, the method proves effective on larger language models, such as Mistral 7B, achieving over 50% accuracy on GSM8K without producing any intermediate steps. The paper compares the proposed method to existing approaches, including No CoT, Explicit CoT, and ICoT-KD (implicit CoT via knowledge distillation). The results show that Stepwise Internalization outperforms these methods in terms of accuracy and efficiency. For example, a GPT-2 Small model trained with Stepwise Internalization can solve 9-by-9 multiplication with 99% accuracy, while ICoT-KD fails to solve 5-by-5 multiplication. Additionally, ICoT-SI (Stepwise Internalization) achieves over 50% accuracy on GSM8K, outperforming GPT-4 without chain-of-thought reasoning. The method is effective across various tasks and model sizes, demonstrating significant improvements over standard training methods. It enables the internalization of CoT reasoning in a general way, making it applicable to tasks beyond arithmetic, such as grade-school math problems. The paper also discusses the trade-off between accuracy and speed, showing that while implicit CoT methods lag behind explicit CoT in accuracy, they offer significant speed advantages. For instance, on the 9-by-9 multiplication task, ICoT-SI is comparable in accuracy to explicit CoT but is 11 times faster during inference. The paper also discusses the limitations of the approach, including high training costs and potential instability with aggressive removal schedules. However, the method shows promise in enabling implicit CoT reasoning with a compelling trade-off between accuracy and speed. The results demonstrate that Stepwise Internalization is an effective method for enabling implicit CoT reasoning, offering a valuable approach for tasks requiring both high performance and low latency.This paper introduces Stepwise Internalization, a method for enabling implicit chain-of-thought (CoT) reasoning in language models by gradually removing intermediate reasoning steps during training. The approach begins with a model trained for explicit CoT reasoning and then gradually removes the intermediate steps, allowing the model to internalize the reasoning process. This method enables a GPT-2 Small model to solve 9-by-9 multiplication with up to 99% accuracy, while standard training cannot solve beyond 4-by-4 multiplication. Furthermore, the method proves effective on larger language models, such as Mistral 7B, achieving over 50% accuracy on GSM8K without producing any intermediate steps. The paper compares the proposed method to existing approaches, including No CoT, Explicit CoT, and ICoT-KD (implicit CoT via knowledge distillation). The results show that Stepwise Internalization outperforms these methods in terms of accuracy and efficiency. For example, a GPT-2 Small model trained with Stepwise Internalization can solve 9-by-9 multiplication with 99% accuracy, while ICoT-KD fails to solve 5-by-5 multiplication. Additionally, ICoT-SI (Stepwise Internalization) achieves over 50% accuracy on GSM8K, outperforming GPT-4 without chain-of-thought reasoning. The method is effective across various tasks and model sizes, demonstrating significant improvements over standard training methods. It enables the internalization of CoT reasoning in a general way, making it applicable to tasks beyond arithmetic, such as grade-school math problems. The paper also discusses the trade-off between accuracy and speed, showing that while implicit CoT methods lag behind explicit CoT in accuracy, they offer significant speed advantages. For instance, on the 9-by-9 multiplication task, ICoT-SI is comparable in accuracy to explicit CoT but is 11 times faster during inference. The paper also discusses the limitations of the approach, including high training costs and potential instability with aggressive removal schedules. However, the method shows promise in enabling implicit CoT reasoning with a compelling trade-off between accuracy and speed. The results demonstrate that Stepwise Internalization is an effective method for enabling implicit CoT reasoning, offering a valuable approach for tasks requiring both high performance and low latency.
Reach us at info@study.space
Understanding From Explicit CoT to Implicit CoT%3A Learning to Internalize CoT Step by Step