[slides] COMET%3A A Neural Framework for MT Evaluation

COMET is a neural framework designed to train multilingual machine translation (MT) evaluation models, achieving state-of-the-art correlation with human judgments. The framework leverages recent advancements in cross-lingual language modeling to create highly multilingual and adaptable MT evaluation models that utilize information from both the source input and a target-language reference translation. To demonstrate the effectiveness of COMET, three models are trained using different types of human judgments: Direct Assessments (DA), Human-mediated Translation Edit Rate (HTER), and Multidimensional Quality Metrics (MQM). These models achieve new state-of-the-art performance on the WMT 2019 Metrics shared task and show robustness to high-performing MT systems. The framework supports two architectures: the Estimator model and the Translation Ranking model, both of which are composed of a cross-lingual encoder and a pooling layer. The Estimator model regresses directly on quality scores, while the Translation Ranking model minimizes the distance between a "better" hypothesis and its reference and source. The paper also discusses the importance of the source language input and the performance of COMET models on various language pairs, including those not involving English. The authors plan to release the COMET framework and trained models to the research community upon publication.COMET is a neural framework designed to train multilingual machine translation (MT) evaluation models, achieving state-of-the-art correlation with human judgments. The framework leverages recent advancements in cross-lingual language modeling to create highly multilingual and adaptable MT evaluation models that utilize information from both the source input and a target-language reference translation. To demonstrate the effectiveness of COMET, three models are trained using different types of human judgments: Direct Assessments (DA), Human-mediated Translation Edit Rate (HTER), and Multidimensional Quality Metrics (MQM). These models achieve new state-of-the-art performance on the WMT 2019 Metrics shared task and show robustness to high-performing MT systems. The framework supports two architectures: the Estimator model and the Translation Ranking model, both of which are composed of a cross-lingual encoder and a pooling layer. The Estimator model regresses directly on quality scores, while the Translation Ranking model minimizes the distance between a "better" hypothesis and its reference and source. The paper also discusses the importance of the source language input and the performance of COMET models on various language pairs, including those not involving English. The authors plan to release the COMET framework and trained models to the research community upon publication.

COMET: A Neural Framework for MT Evaluation

November 16–20, 2020 | Ricardo Rei, Craig Stewart, Ana C Farinha, Alon Lavie