July 2002 | Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu
The paper introduces BLEU (Bilingual Evaluation Understudy), an automatic method for evaluating machine translation (MT) that is quick, inexpensive, language-independent, and highly correlated with human judgments. BLEU aims to address the limitations of costly and time-consuming human evaluations by providing a metric that can be used for frequent and quick assessments of MT systems. The core of BLEU is a modified $n$-gram precision metric, which measures the closeness of a candidate translation to reference translations. This metric is combined with a brevity penalty to account for differences in length between candidate and reference translations. The authors demonstrate that BLEU effectively distinguishes between good and bad translations and shows high correlation with human judgments, both monolingual and bilingual. The method is validated through experiments on a large corpus of Chinese-English translations and human evaluations, confirming its reliability and practical utility in MT research and development.The paper introduces BLEU (Bilingual Evaluation Understudy), an automatic method for evaluating machine translation (MT) that is quick, inexpensive, language-independent, and highly correlated with human judgments. BLEU aims to address the limitations of costly and time-consuming human evaluations by providing a metric that can be used for frequent and quick assessments of MT systems. The core of BLEU is a modified $n$-gram precision metric, which measures the closeness of a candidate translation to reference translations. This metric is combined with a brevity penalty to account for differences in length between candidate and reference translations. The authors demonstrate that BLEU effectively distinguishes between good and bad translations and shows high correlation with human judgments, both monolingual and bilingual. The method is validated through experiments on a large corpus of Chinese-English translations and human evaluations, confirming its reliability and practical utility in MT research and development.