This paper proposes a method to improve the efficiency of Minimum Bayes Risk (MBR) decoding in neural machine translation (NMT). MBR decoding is a text generation technique that improves translation quality but is computationally expensive due to its quadratic complexity in the number of sampled sequences. The authors propose to approximate pairwise metric scores with scores calculated against aggregated reference representations, reducing the complexity from O(n²) to O(n), while preserving most of the quality gains of MBR decoding.
The method works by combining the representations of multiple references into an aggregate reference representation, which is then used for utility estimation. This approach is applied to two common metrics: CHRF, which is based on character n-gram overlap, and COMET, a neural network trained with human judgments of translation quality. For CHRF, reference aggregation reduces the time needed for computing the utility of 1024 samples by 99.5%, without affecting translation quality. For COMET, the computation time is reduced by 95–99%, making reference aggregation an efficient method for hypothesis pruning.
The paper evaluates the effectiveness of reference aggregation on four translation directions and two utility metrics. The results show that reference aggregation significantly reduces the computational complexity of MBR decoding while maintaining high translation quality. The method is particularly effective for COMET, where it outperforms other efficiency techniques like N-by-S MBR. The authors also propose an aggregate-to-fine MBR approach, which first prunes the number of hypotheses using an aggregate reference and then selects the best hypothesis using standard MBR.
The study demonstrates that reference aggregation is a successful strategy to overcome the quadratic complexity of MBR. However, it is still slower than beam search, as the cost of sampling is now the dominant factor. Future work could focus on improving sampling efficiency, such as using fewer hypotheses, improved caching, or speculative sampling approaches. The paper also discusses limitations, including the requirement for utility metrics based on averageable representations and the need for empirical evaluation of aggregation effectiveness for trained metrics.This paper proposes a method to improve the efficiency of Minimum Bayes Risk (MBR) decoding in neural machine translation (NMT). MBR decoding is a text generation technique that improves translation quality but is computationally expensive due to its quadratic complexity in the number of sampled sequences. The authors propose to approximate pairwise metric scores with scores calculated against aggregated reference representations, reducing the complexity from O(n²) to O(n), while preserving most of the quality gains of MBR decoding.
The method works by combining the representations of multiple references into an aggregate reference representation, which is then used for utility estimation. This approach is applied to two common metrics: CHRF, which is based on character n-gram overlap, and COMET, a neural network trained with human judgments of translation quality. For CHRF, reference aggregation reduces the time needed for computing the utility of 1024 samples by 99.5%, without affecting translation quality. For COMET, the computation time is reduced by 95–99%, making reference aggregation an efficient method for hypothesis pruning.
The paper evaluates the effectiveness of reference aggregation on four translation directions and two utility metrics. The results show that reference aggregation significantly reduces the computational complexity of MBR decoding while maintaining high translation quality. The method is particularly effective for COMET, where it outperforms other efficiency techniques like N-by-S MBR. The authors also propose an aggregate-to-fine MBR approach, which first prunes the number of hypotheses using an aggregate reference and then selects the best hypothesis using standard MBR.
The study demonstrates that reference aggregation is a successful strategy to overcome the quadratic complexity of MBR. However, it is still slower than beam search, as the cost of sampling is now the dominant factor. Future work could focus on improving sampling efficiency, such as using fewer hypotheses, improved caching, or speculative sampling approaches. The paper also discusses limitations, including the requirement for utility metrics based on averageable representations and the need for empirical evaluation of aggregation effectiveness for trained metrics.