6 Jun 2024 | Yunxiang Zhang, Muhammad Khalifa, Jaekyeom Kim, Moontae Lee, Honglak Lee, Lajanugen Logeswaran, Lu Wang
This paper explores the ability of small language models (LMs) to self-correct their reasoning on tasks with minimal input from stronger LMs. The authors propose a novel pipeline, SCORE, which leverages correct solutions to guide smaller LMs in critiquing their incorrect responses. The generated critiques are then used for supervised fine-tuning to enhance the models' self-correction abilities. Experimental results show that SCORE significantly improves the self-correction performance of two small models on five datasets, particularly when paired with a strong GPT-4-based verifier. However, the effectiveness is limited when using a weak self-verifier. The study highlights the importance of strong verifiers in enabling small LMs to effectively self-correct and suggests future research directions to improve reasoning verification.This paper explores the ability of small language models (LMs) to self-correct their reasoning on tasks with minimal input from stronger LMs. The authors propose a novel pipeline, SCORE, which leverages correct solutions to guide smaller LMs in critiquing their incorrect responses. The generated critiques are then used for supervised fine-tuning to enhance the models' self-correction abilities. Experimental results show that SCORE significantly improves the self-correction performance of two small models on five datasets, particularly when paired with a strong GPT-4-based verifier. However, the effectiveness is limited when using a weak self-verifier. The study highlights the importance of strong verifiers in enabling small LMs to effectively self-correct and suggests future research directions to improve reasoning verification.