[slides] OffsetBias%3A Leveraging Debiased Data for Tuning Evaluators

The paper "OffsetBias: Leveraging Debias Data for Tuning Evaluators" by Junsoo Park, Seungyeon Jwa, Meiyng Ren, Daeyoung Kim, and Sanghyuk Choi addresses the issue of biases in Large Language Models (LLMs) used for evaluating text quality. The authors identify six types of biases in judge models, including length bias, concreteness bias, empty reference bias, content continuation bias, nested instruction bias, and familiar knowledge bias. They propose EVALBIASBENCH, a collection of 80 evaluation instances to quantify the robustness of judge models against these biases. To mitigate these biases, the authors construct OFFSETBIAS, a preference dataset that includes pairs of good and bad responses, where the bad responses contain critical errors but exhibit stylistic qualities preferred by judge models. The dataset is designed to be integrated into the training of judge models to offset existing biases. Experimental results show that fine-tuning judge models on OFFSETBIAS significantly enhances their robustness to biases and improves performance across various evaluation scenarios. The authors also demonstrate the effectiveness of OFFSETBIAS in training reward models. The main contributions of the paper are: 1. Identifying six types of biases in judge models and proposing EVALBIASBENCH. 2. Proposing OFFSETBIAS and its construction methods to enhance judge models' performance. 3. Show that incorporating OFFSETBIAS into judge model training improves robustness to biases and enhances general judging capability. The paper discusses related work, including LLM-based evaluation and meta-evaluation benchmarks, and provides a detailed methodology for constructing the OFFSETBIAS dataset and evaluating the effectiveness of the proposed approach. The authors conclude by discussing limitations and ethical considerations, emphasizing the need for further research to address biases in evaluation models.The paper "OffsetBias: Leveraging Debias Data for Tuning Evaluators" by Junsoo Park, Seungyeon Jwa, Meiyng Ren, Daeyoung Kim, and Sanghyuk Choi addresses the issue of biases in Large Language Models (LLMs) used for evaluating text quality. The authors identify six types of biases in judge models, including length bias, concreteness bias, empty reference bias, content continuation bias, nested instruction bias, and familiar knowledge bias. They propose EVALBIASBENCH, a collection of 80 evaluation instances to quantify the robustness of judge models against these biases. To mitigate these biases, the authors construct OFFSETBIAS, a preference dataset that includes pairs of good and bad responses, where the bad responses contain critical errors but exhibit stylistic qualities preferred by judge models. The dataset is designed to be integrated into the training of judge models to offset existing biases. Experimental results show that fine-tuning judge models on OFFSETBIAS significantly enhances their robustness to biases and improves performance across various evaluation scenarios. The authors also demonstrate the effectiveness of OFFSETBIAS in training reward models. The main contributions of the paper are: 1. Identifying six types of biases in judge models and proposing EVALBIASBENCH. 2. Proposing OFFSETBIAS and its construction methods to enhance judge models' performance. 3. Show that incorporating OFFSETBIAS into judge model training improves robustness to biases and enhances general judging capability. The paper discusses related work, including LLM-based evaluation and meta-evaluation benchmarks, and provides a detailed methodology for constructing the OFFSETBIAS dataset and evaluating the effectiveness of the proposed approach. The authors conclude by discussing limitations and ethical considerations, emphasizing the need for further research to address biases in evaluation models.

OffsetBias: Leveraging Debiased Data for Tuning Evaluators

7 Oct 2024 | Junsoo Park, Seungyeon Jwa, Meiying Ren, Daeyoung Kim, Sanghyuk Choi