[slides and audio] Don't Rank%2C Combine! Combining Machine Translation Hypotheses Using Quality Estimation

This paper introduces QE-fusion, a novel method that combines machine translation hypotheses using quality estimation metrics (QE) to improve translation quality. QE-fusion leverages a pool of candidate translations generated by a model and combines spans from different candidates using a QE metric like COMETKiwi. The method is compared against beam search and recent reranking techniques such as Minimum Bayes Risk decoding and QE-reranking. The results show that QE-fusion consistently improves translation quality in terms of COMET and BLEURT scores across various large language models (LLMs) and multilingual translation models (NLLB) over five language pairs. Notably, QE-fusion exhibits larger improvements for LLMs due to their ability to generate diverse outputs. The approach also demonstrates the ability to generate novel translations in over half of the cases and scales linearly with the number of candidates in the pool. The paper further analyzes the role of candidate diversity, computation time, and the impact of QE-fusion on hallucinations, showing that it effectively reduces hallucinations and maintains superior performance even when the pool of candidates grows.This paper introduces QE-fusion, a novel method that combines machine translation hypotheses using quality estimation metrics (QE) to improve translation quality. QE-fusion leverages a pool of candidate translations generated by a model and combines spans from different candidates using a QE metric like COMETKiwi. The method is compared against beam search and recent reranking techniques such as Minimum Bayes Risk decoding and QE-reranking. The results show that QE-fusion consistently improves translation quality in terms of COMET and BLEURT scores across various large language models (LLMs) and multilingual translation models (NLLB) over five language pairs. Notably, QE-fusion exhibits larger improvements for LLMs due to their ability to generate diverse outputs. The approach also demonstrates the ability to generate novel translations in over half of the cases and scales linearly with the number of candidates in the pool. The paper further analyzes the role of candidate diversity, computation time, and the impact of QE-fusion on hallucinations, showing that it effectively reduces hallucinations and maintains superior performance even when the pool of candidates grows.

Don't Rank, Combine! Combining Machine Translation Hypotheses Using Quality Estimation

6 Jun 2024 | Giorgos Vernikos, Andrei Popescu-Belis