Meteor Universal is a language-specific translation evaluation system that allows for the assessment of translation quality for any target language, even those previously unsupported. It achieves this by automatically extracting linguistic resources, such as paraphrase tables and function word lists, from the bitext used to train machine translation (MT) systems. Additionally, it uses a universal parameter set learned from human judgments across multiple language directions. This universal parameter set generalizes across languages and captures general human preferences, allowing Meteor Universal to significantly outperform baseline BLEU on two new languages, Russian and Hindi.
Meteor Universal improves evaluation accuracy by combining automatically learned linguistic resources with a universal parameter set. The system uses a scoring function that aligns translation hypotheses with reference translations and calculates sentence-level similarity scores. It considers content and function words, and uses different matchers (exact, stem, synonym, paraphrase) to identify matches between the hypothesis and reference. The system then calculates weighted precision and recall, and uses a harmonic mean of these values to determine the final score. A fragmentation penalty is also applied to account for gaps and differences in word order.
Meteor Universal uses language-specific resources, such as function word lists and paraphrase tables, to improve evaluation accuracy. Function word lists are used to distinguish between content and function words in the target language, while paraphrase tables allow for many-to-many matches that can capture local language phenomena. These resources are automatically extracted from the training bitext using a translation pivot approach.
The universal parameter set is learned from a large set of binary ranking judgments from WMT12, covering 8 language directions. This parameter set captures general human preferences, such as preferring recall over precision, word choice over word order, and correct translations of content words over function words. The universal parameter set is more balanced and shows a normalizing effect from generalizing across several language directions.
Experiments show that Meteor Universal significantly outperforms baseline BLEU on WMT13 Russian and WMT14 Hindi. It also performs slightly worse than versions of Meteor tuned for individual languages, but still provides substantial evidence that it will further generalize, bringing improved evaluation accuracy to new target languages. Meteor Universal is included in Meteor version 1.5, which is publicly released for WMT14. It is free software released under the terms of the GNU Lesser General Public License.Meteor Universal is a language-specific translation evaluation system that allows for the assessment of translation quality for any target language, even those previously unsupported. It achieves this by automatically extracting linguistic resources, such as paraphrase tables and function word lists, from the bitext used to train machine translation (MT) systems. Additionally, it uses a universal parameter set learned from human judgments across multiple language directions. This universal parameter set generalizes across languages and captures general human preferences, allowing Meteor Universal to significantly outperform baseline BLEU on two new languages, Russian and Hindi.
Meteor Universal improves evaluation accuracy by combining automatically learned linguistic resources with a universal parameter set. The system uses a scoring function that aligns translation hypotheses with reference translations and calculates sentence-level similarity scores. It considers content and function words, and uses different matchers (exact, stem, synonym, paraphrase) to identify matches between the hypothesis and reference. The system then calculates weighted precision and recall, and uses a harmonic mean of these values to determine the final score. A fragmentation penalty is also applied to account for gaps and differences in word order.
Meteor Universal uses language-specific resources, such as function word lists and paraphrase tables, to improve evaluation accuracy. Function word lists are used to distinguish between content and function words in the target language, while paraphrase tables allow for many-to-many matches that can capture local language phenomena. These resources are automatically extracted from the training bitext using a translation pivot approach.
The universal parameter set is learned from a large set of binary ranking judgments from WMT12, covering 8 language directions. This parameter set captures general human preferences, such as preferring recall over precision, word choice over word order, and correct translations of content words over function words. The universal parameter set is more balanced and shows a normalizing effect from generalizing across several language directions.
Experiments show that Meteor Universal significantly outperforms baseline BLEU on WMT13 Russian and WMT14 Hindi. It also performs slightly worse than versions of Meteor tuned for individual languages, but still provides substantial evidence that it will further generalize, bringing improved evaluation accuracy to new target languages. Meteor Universal is included in Meteor version 1.5, which is publicly released for WMT14. It is free software released under the terms of the GNU Lesser General Public License.