6 Jun 2024 | Giorgos Vernikos, Andrei Popescu-Belis
QE-fusion is a method for combining machine translation hypotheses using quality estimation (QE) metrics to improve translation quality. This approach leverages a pool of candidate translations generated by models, such as large language models (LLMs) and multilingual NMT models, and combines spans from different candidates based on a QE metric like COMETKIWI. QE-fusion outperforms beam search and reranking techniques, including Minimum Bayes Risk (MBR) decoding and QE-reranking, in terms of COMET and BLEURT scores across various language pairs and models. It consistently improves translation quality, especially for LLMs, which generate diverse outputs. QE-fusion also produces novel translations in over half of the cases and scales linearly with the number of candidates in the pool. The method is effective in reducing hallucinations and enhancing translation reliability compared to other approaches. QE-fusion is particularly beneficial for LLMs, where it leverages their diversity of outputs to create improved translations. The algorithm is efficient, with runtime scaling linearly with the number of candidates. QE-fusion has been tested on various models, including PolyLM, XGLM, Llama2, Mistral, ALMA, and Tower, as well as multilingual NMT models like NLLB. The results show that QE-fusion consistently outperforms other methods across different language pairs and models. The method is also effective in reducing hallucinations compared to QE-reranking and MBR, and it has been shown to produce novel translations in many cases. The approach is scalable and efficient, with runtime scaling linearly with the number of candidates. The method has been validated through extensive experiments and comparisons with other techniques, demonstrating its effectiveness in improving translation quality.QE-fusion is a method for combining machine translation hypotheses using quality estimation (QE) metrics to improve translation quality. This approach leverages a pool of candidate translations generated by models, such as large language models (LLMs) and multilingual NMT models, and combines spans from different candidates based on a QE metric like COMETKIWI. QE-fusion outperforms beam search and reranking techniques, including Minimum Bayes Risk (MBR) decoding and QE-reranking, in terms of COMET and BLEURT scores across various language pairs and models. It consistently improves translation quality, especially for LLMs, which generate diverse outputs. QE-fusion also produces novel translations in over half of the cases and scales linearly with the number of candidates in the pool. The method is effective in reducing hallucinations and enhancing translation reliability compared to other approaches. QE-fusion is particularly beneficial for LLMs, where it leverages their diversity of outputs to create improved translations. The algorithm is efficient, with runtime scaling linearly with the number of candidates. QE-fusion has been tested on various models, including PolyLM, XGLM, Llama2, Mistral, ALMA, and Tower, as well as multilingual NMT models like NLLB. The results show that QE-fusion consistently outperforms other methods across different language pairs and models. The method is also effective in reducing hallucinations compared to QE-reranking and MBR, and it has been shown to produce novel translations in many cases. The approach is scalable and efficient, with runtime scaling linearly with the number of candidates. The method has been validated through extensive experiments and comparisons with other techniques, demonstrating its effectiveness in improving translation quality.