12 Jun 2024 | Zhilin Wang, Yi Dong, Olivier Delalleau, Jiaqi Zeng, Gerald Shen, Daniel Egert, Jimmy J. Zhang, Makesh Narsimhan Sreedhar, Oleksii Kuchaiev
**HelpSteer2: Open-source Dataset for Training Top-performing Reward Models**
NVIDIA researchers have released HelpSteer2, a permissively licensed (CC-BY-4.0) dataset designed to train state-of-the-art reward models. This dataset, consisting of only 10,000 response pairs, is significantly smaller than existing preference datasets like HH-RLHF but remains highly effective for training reward models. The dataset is sourced from ShareGPT, a platform where users share their conversations, and includes multi-turn prompts to ensure it is effective for multi-turn conversations.
The researchers trained reward models using the Llama 3 70B base model and an in-house Nemotron-4 340B base model. These models achieved a state-of-the-art score (92.0%) on Reward-Bench’s primary dataset, outperforming both open and proprietary models as of June 12, 2024. They also proposed SteerLM 2.0, a novel model alignment approach that leverages the rich multi-attribute scores predicted by the reward models to train models to follow complex multi-requirement instructions.
The paper details the dataset collection process, including the use of diverse prompts and response generation methods, as well as the annotation process to ensure high-quality data. The results show that models trained with HelpSteer2 perform well on various metrics, particularly in the Chat-Hard category, and demonstrate the effectiveness of the proposed alignment methods. The dataset and code are available for public use, encouraging further development and application in AI alignment.**HelpSteer2: Open-source Dataset for Training Top-performing Reward Models**
NVIDIA researchers have released HelpSteer2, a permissively licensed (CC-BY-4.0) dataset designed to train state-of-the-art reward models. This dataset, consisting of only 10,000 response pairs, is significantly smaller than existing preference datasets like HH-RLHF but remains highly effective for training reward models. The dataset is sourced from ShareGPT, a platform where users share their conversations, and includes multi-turn prompts to ensure it is effective for multi-turn conversations.
The researchers trained reward models using the Llama 3 70B base model and an in-house Nemotron-4 340B base model. These models achieved a state-of-the-art score (92.0%) on Reward-Bench’s primary dataset, outperforming both open and proprietary models as of June 12, 2024. They also proposed SteerLM 2.0, a novel model alignment approach that leverages the rich multi-attribute scores predicted by the reward models to train models to follow complex multi-requirement instructions.
The paper details the dataset collection process, including the use of diverse prompts and response generation methods, as well as the annotation process to ensure high-quality data. The results show that models trained with HelpSteer2 perform well on various metrics, particularly in the Chat-Hard category, and demonstrate the effectiveness of the proposed alignment methods. The dataset and code are available for public use, encouraging further development and application in AI alignment.