BLEURT: Learning Robust Metrics for Text Generation

BLEURT: Learning Robust Metrics for Text Generation

21 May 2020 | Thibault Sellam Dipanjan Das Ankur P. Parikh
BLEURT is a learned evaluation metric for text generation based on BERT, designed to model human judgments with a few thousand possibly biased training examples. The key innovation is a novel pre-training scheme that uses millions of synthetic examples to help the model generalize. BLEURT outperforms existing metrics on the WMT Metrics shared task and the WebNLG Competition dataset, even with limited training data and out-of-distribution scenarios. The pre-training scheme involves generating synthetic sentence pairs through random perturbations of Wikipedia sentences and augmenting them with various lexical and semantic-level supervision signals. BLEURT's performance is evaluated on translation and data-to-text tasks, demonstrating its robustness to quality drifts and adaptability to new tasks. Ablation experiments show that the pre-training scheme significantly improves BLEURT's performance, especially in the IID setting.BLEURT is a learned evaluation metric for text generation based on BERT, designed to model human judgments with a few thousand possibly biased training examples. The key innovation is a novel pre-training scheme that uses millions of synthetic examples to help the model generalize. BLEURT outperforms existing metrics on the WMT Metrics shared task and the WebNLG Competition dataset, even with limited training data and out-of-distribution scenarios. The pre-training scheme involves generating synthetic sentence pairs through random perturbations of Wikipedia sentences and augmenting them with various lexical and semantic-level supervision signals. BLEURT's performance is evaluated on translation and data-to-text tasks, demonstrating its robustness to quality drifts and adaptability to new tasks. Ablation experiments show that the pre-training scheme significantly improves BLEURT's performance, especially in the IID setting.
Reach us at info@study.space
Understanding BLEURT%3A Learning Robust Metrics for Text Generation