12 Jun 2024 | Zhilin Wang, Yi Dong, Olivier Delalleau, Jiaqi Zeng, Gerald Shen, Daniel Egert, Jimmy J. Zhang, Makesh Narsimhan Sreedhar, Oleksii Kuchaiev
HelpSteer2 is an open-source, permissively licensed (CC-BY-4.0) dataset designed to train high-performing reward models for large language models (LLMs). It contains 10,000 response pairs, making it highly efficient for training reward models. The dataset was collected from ShareGPT, a platform where users voluntarily share conversations, and supplemented with proprietary prompts for specific use cases. The dataset includes multi-turn prompts and responses generated by an in-house model to ensure compatibility with open-source and commercial settings. The responses were annotated by multiple annotators to ensure quality, with a focus on five attributes: helpfulness, correctness, coherence, complexity, and verbosity. The dataset achieved a high inter-annotator agreement, with Cohen's κ reaching 0.706 for helpfulness.
The dataset was used to train reward models, including SteerLM 2.0, which improves model alignment by leveraging multi-attribute scores. The reward models trained on HelpSteer2 achieved state-of-the-art performance on Reward-Bench, outperforming existing open and proprietary models. The models were evaluated on various metrics, including TruthfulQA, Arena Hard, and AlpacaEval 2.0 LC, with SteerLM 2.0 performing optimally on MT-Bench. The dataset is available at https://huggingface.co/datasets/nvidia/HelpSteer2 and the code is available at https://github.com/NVIDIA/NeMo-Aligner. The dataset and its associated models demonstrate the effectiveness of using HelpSteer2 to train high-quality reward models that align LLMs with human preferences.HelpSteer2 is an open-source, permissively licensed (CC-BY-4.0) dataset designed to train high-performing reward models for large language models (LLMs). It contains 10,000 response pairs, making it highly efficient for training reward models. The dataset was collected from ShareGPT, a platform where users voluntarily share conversations, and supplemented with proprietary prompts for specific use cases. The dataset includes multi-turn prompts and responses generated by an in-house model to ensure compatibility with open-source and commercial settings. The responses were annotated by multiple annotators to ensure quality, with a focus on five attributes: helpfulness, correctness, coherence, complexity, and verbosity. The dataset achieved a high inter-annotator agreement, with Cohen's κ reaching 0.706 for helpfulness.
The dataset was used to train reward models, including SteerLM 2.0, which improves model alignment by leveraging multi-attribute scores. The reward models trained on HelpSteer2 achieved state-of-the-art performance on Reward-Bench, outperforming existing open and proprietary models. The models were evaluated on various metrics, including TruthfulQA, Arena Hard, and AlpacaEval 2.0 LC, with SteerLM 2.0 performing optimally on MT-Bench. The dataset is available at https://huggingface.co/datasets/nvidia/HelpSteer2 and the code is available at https://github.com/NVIDIA/NeMo-Aligner. The dataset and its associated models demonstrate the effectiveness of using HelpSteer2 to train high-quality reward models that align LLMs with human preferences.