HelpSteer2: Open-source dataset for training top-performing reward models

HelpSteer2: Open-source dataset for training top-performing reward models

12 Jun 2024 | Zhilin Wang, Yi Dong, Olivier Delalleau, Jiaqi Zeng, Gerald Shen, Daniel Egert, Jimmy J. Zhang, Makesh Narsimhan Sreedhar, Oleksii Kuchaiev
**HelpSteer2: Open-source Dataset for Training Top-performing Reward Models** NVIDIA researchers have released HelpSteer2, a permissively licensed (CC-BY-4.0) dataset designed to train state-of-the-art reward models. This dataset, consisting of only 10,000 response pairs, is significantly smaller than existing preference datasets like HH-RLHF but remains highly effective for training reward models. The dataset is sourced from ShareGPT, a platform where users share their conversations, and includes multi-turn prompts to ensure it is effective for multi-turn conversations. The researchers trained reward models using the Llama 3 70B base model and an in-house Nemotron-4 340B base model. These models achieved a state-of-the-art score (92.0%) on Reward-Bench’s primary dataset, outperforming both open and proprietary models as of June 12, 2024. They also proposed SteerLM 2.0, a novel model alignment approach that leverages the rich multi-attribute scores predicted by the reward models to train models to follow complex multi-requirement instructions. The paper details the dataset collection process, including the use of diverse prompts and response generation methods, as well as the annotation process to ensure high-quality data. The results show that models trained with HelpSteer2 perform well on various metrics, particularly in the Chat-Hard category, and demonstrate the effectiveness of the proposed alignment methods. The dataset and code are available for public use, encouraging further development and application in AI alignment.**HelpSteer2: Open-source Dataset for Training Top-performing Reward Models** NVIDIA researchers have released HelpSteer2, a permissively licensed (CC-BY-4.0) dataset designed to train state-of-the-art reward models. This dataset, consisting of only 10,000 response pairs, is significantly smaller than existing preference datasets like HH-RLHF but remains highly effective for training reward models. The dataset is sourced from ShareGPT, a platform where users share their conversations, and includes multi-turn prompts to ensure it is effective for multi-turn conversations. The researchers trained reward models using the Llama 3 70B base model and an in-house Nemotron-4 340B base model. These models achieved a state-of-the-art score (92.0%) on Reward-Bench’s primary dataset, outperforming both open and proprietary models as of June 12, 2024. They also proposed SteerLM 2.0, a novel model alignment approach that leverages the rich multi-attribute scores predicted by the reward models to train models to follow complex multi-requirement instructions. The paper details the dataset collection process, including the use of diverse prompts and response generation methods, as well as the annotation process to ensure high-quality data. The results show that models trained with HelpSteer2 perform well on various metrics, particularly in the Chat-Hard category, and demonstrate the effectiveness of the proposed alignment methods. The dataset and code are available for public use, encouraging further development and application in AI alignment.
Reach us at info@study.space