MULTI-STEP PROBLEM SOLVING THROUGH A VERIFIER: AN EMPIRICAL ANALYSIS ON MODEL-INDUCED PROCESS SUPERVISION

MULTI-STEP PROBLEM SOLVING THROUGH A VERIFIER: AN EMPIRICAL ANALYSIS ON MODEL-INDUCED PROCESS SUPERVISION

5 Feb 2024 | Zihan Wang1*, Yunxuan Li2†, Yuexin Wu2, Liangchen Luo2, Le Hou2, Hongkun Yu2, Jingbo Shang1
This paper introduces Model-induced Process Supervision (MiPS), a novel method for automating the data curation process in multi-step problem-solving scenarios. MiPS addresses the challenge of obtaining ground truth annotations on intermediate solution steps, which is resource-intensive and expensive. By using Monte Carlo sampling, MiPS annotates intermediate steps by sampling completions from a reasoning model and calculating the accuracy of these completions. The authors find that errors in the reasoning model can lead to underestimating the accuracy of intermediate steps, suggesting that verification should focus on high predicted values rather than low ones. The proposed method significantly improves the performance of PaLM 2 on math and coding tasks, demonstrating strong generalization across different reasoning models. The paper also provides an empirical analysis on the design choices and properties of the trained verifier, highlighting the benefits of process supervision data and the impact of noise on generalization.This paper introduces Model-induced Process Supervision (MiPS), a novel method for automating the data curation process in multi-step problem-solving scenarios. MiPS addresses the challenge of obtaining ground truth annotations on intermediate solution steps, which is resource-intensive and expensive. By using Monte Carlo sampling, MiPS annotates intermediate steps by sampling completions from a reasoning model and calculating the accuracy of these completions. The authors find that errors in the reasoning model can lead to underestimating the accuracy of intermediate steps, suggesting that verification should focus on high predicted values rather than low ones. The proposed method significantly improves the performance of PaLM 2 on math and coding tasks, demonstrating strong generalization across different reasoning models. The paper also provides an empirical analysis on the design choices and properties of the trained verifier, highlighting the benefits of process supervision data and the impact of noise on generalization.
Reach us at info@study.space