OpenRLHF is an open-source framework for scalable and high-performance reinforcement learning from human feedback (RLHF) training of large language models (LLMs). It addresses the challenges of coordinating four models (actor, critic, reward, reference) during training, especially for models with over 70 billion parameters. Unlike existing frameworks that co-locate models on the same GPU, OpenRLHF uses Ray, vLLM, and DeepSpeed to distribute models across multiple GPUs, improving resource utilization and training efficiency. It integrates seamlessly with Hugging Face, providing an out-of-the-box solution with optimized algorithms and launch scripts, ensuring user-friendliness. OpenRLHF supports various alignment techniques, including RLHF, DPO, rejection sampling, and others, enabling full-scale RLHF training.
The framework optimizes both training and inference efficiency. It leverages vLLM's tensor parallelism and other techniques to accelerate generation, especially for large models that cannot fit on a single GPU. In the learning stage, OpenRLHF employs techniques such as offloading Adam optimizer states to the CPU, using Flash Attention 2, and removing redundant padding to improve performance. It also uses ZeRO stage 3 for sharding model, gradients, and optimizer states, and synchronizes weights between the ZeRO and vLLM engines using NCCL.
To stabilize training, OpenRLHF implements several PPO tricks, including using a low learning rate for the actor model, a higher learning rate for the critic model, and applying reward normalization and distributed advantage normalization. It also initializes the critic model with the weights of the reward model, and uses a lower learning rate for the actor while the critic has a higher learning rate. The framework also freezes the actor weights in the initial learning stage for better initialization of the critic.
OpenRLHF provides one-click trainable scripts for supported algorithms, fully compatible with the Hugging Face library. It supports a wide range of models and training techniques, including Mixture of Experts (MoE), Jamba, and QLoRA. The framework has been tested and compared with other RLHF frameworks like DSChat, showing significant performance advantages in terms of training time and resource utilization. OpenRLHF is available at https://github.com/OpenRLHF/OpenRLHF.OpenRLHF is an open-source framework for scalable and high-performance reinforcement learning from human feedback (RLHF) training of large language models (LLMs). It addresses the challenges of coordinating four models (actor, critic, reward, reference) during training, especially for models with over 70 billion parameters. Unlike existing frameworks that co-locate models on the same GPU, OpenRLHF uses Ray, vLLM, and DeepSpeed to distribute models across multiple GPUs, improving resource utilization and training efficiency. It integrates seamlessly with Hugging Face, providing an out-of-the-box solution with optimized algorithms and launch scripts, ensuring user-friendliness. OpenRLHF supports various alignment techniques, including RLHF, DPO, rejection sampling, and others, enabling full-scale RLHF training.
The framework optimizes both training and inference efficiency. It leverages vLLM's tensor parallelism and other techniques to accelerate generation, especially for large models that cannot fit on a single GPU. In the learning stage, OpenRLHF employs techniques such as offloading Adam optimizer states to the CPU, using Flash Attention 2, and removing redundant padding to improve performance. It also uses ZeRO stage 3 for sharding model, gradients, and optimizer states, and synchronizes weights between the ZeRO and vLLM engines using NCCL.
To stabilize training, OpenRLHF implements several PPO tricks, including using a low learning rate for the actor model, a higher learning rate for the critic model, and applying reward normalization and distributed advantage normalization. It also initializes the critic model with the weights of the reward model, and uses a lower learning rate for the actor while the critic has a higher learning rate. The framework also freezes the actor weights in the initial learning stage for better initialization of the critic.
OpenRLHF provides one-click trainable scripts for supported algorithms, fully compatible with the Hugging Face library. It supports a wide range of models and training techniques, including Mixture of Experts (MoE), Jamba, and QLoRA. The framework has been tested and compared with other RLHF frameworks like DSChat, showing significant performance advantages in terms of training time and resource utilization. OpenRLHF is available at https://github.com/OpenRLHF/OpenRLHF.