Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

18 Mar 2024 | Eric Zelikman, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, Noah D. Goodman
Quiet-STaR is a method that enables language models (LMs) to learn to reason by generating internal rationales for each token in the text. This approach allows LMs to improve their ability to predict future text by thinking before speaking. The method involves generating rationales in parallel with the text, mixing predictions with and without rationales, and using a reinforcement learning-based reward to optimize the generation of these rationales. Quiet-STaR addresses challenges such as computational cost, the need for internal thought generation, and the prediction of multiple tokens ahead. It uses a tokenwise parallel sampling algorithm and an extended teacher-forcing technique to enhance the model's reasoning capabilities. The results show that Quiet-STaR improves zero-shot performance on tasks like GSM8K and CommonsenseQA without requiring fine-tuning. The method demonstrates that LMs can learn to reason more effectively by leveraging the implicit reasoning in text, leading to better performance on reasoning tasks. Quiet-STaR represents a step towards more general and scalable reasoning in language models.Quiet-STaR is a method that enables language models (LMs) to learn to reason by generating internal rationales for each token in the text. This approach allows LMs to improve their ability to predict future text by thinking before speaking. The method involves generating rationales in parallel with the text, mixing predictions with and without rationales, and using a reinforcement learning-based reward to optimize the generation of these rationales. Quiet-STaR addresses challenges such as computational cost, the need for internal thought generation, and the prediction of multiple tokens ahead. It uses a tokenwise parallel sampling algorithm and an extended teacher-forcing technique to enhance the model's reasoning capabilities. The results show that Quiet-STaR improves zero-shot performance on tasks like GSM8K and CommonsenseQA without requiring fine-tuning. The method demonstrates that LMs can learn to reason more effectively by leveraging the implicit reasoning in text, leading to better performance on reasoning tasks. Quiet-STaR represents a step towards more general and scalable reasoning in language models.
Reach us at info@study.space
Understanding Quiet-STaR%3A Language Models Can Teach Themselves to Think Before Speaking