[slides and audio] TLCR%3A Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback

The paper introduces Token-Level Continuous Reward (TLCR), a novel reward model designed to provide detailed, token-based continuous rewards for Reinforcement Learning from Human Feedback (RLHF). The authors address the limitations of traditional sequence-level and token-level discrete reward mechanisms, which often fail to capture the varying degrees of preference for individual tokens. TLCR leverages a discriminator trained to distinguish positive and negative tokens, using the confidence of this discriminator to assign continuous rewards to each token based on context. The method is evaluated on open-ended generation benchmarks, showing consistent performance improvements over previous approaches. The code for TLCR is publicly available, and the paper includes extensive experimental results and ablation studies to demonstrate the effectiveness of the proposed approach.The paper introduces Token-Level Continuous Reward (TLCR), a novel reward model designed to provide detailed, token-based continuous rewards for Reinforcement Learning from Human Feedback (RLHF). The authors address the limitations of traditional sequence-level and token-level discrete reward mechanisms, which often fail to capture the varying degrees of preference for individual tokens. TLCR leverages a discriminator trained to distinguish positive and negative tokens, using the confidence of this discriminator to assign continuous rewards to each token based on context. The method is evaluated on open-ended generation benchmarks, showing consistent performance improvements over previous approaches. The code for TLCR is publicly available, and the paper includes extensive experimental results and ablation studies to demonstrate the effectiveness of the proposed approach.

TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback

8 Dec 2024 | Eunseop Yoon, Hee Suk Yoon, SooHwan Eom, Gunsoo Han, Daniel Wontae Nam, Daejin Jo, Kyoung-Woon On, Mark Hasegawa-Johnson, Sungwoong Kim, Chang D. Yoo