[slides and audio] From LLM to NMT%3A Advancing Low-Resource Machine Translation with Claude

The paper "From LLM to NMT: Advancing Low-Resource Machine Translation with Claude" by Maxim Enis and Mark Hopkins from Williams College explores the capabilities of Claude 3 Opus, a large language model (LLM) released by Anthropic in March 2024, in machine translation. The authors find that Claude 3 Opus exhibits stronger machine translation competence compared to other LLMs, particularly in low-resource settings. They address the issue of data contamination on the FLORES-200 benchmark but curate new benchmarks to validate Claude's effectiveness. The study highlights Claude's *resource efficiency*, meaning its performance depends less on the resource level of the language pair. Additionally, the authors demonstrate that advancements in LLM translation can be compressed into traditional neural machine translation (NMT) models through knowledge distillation. Using synthetic data generated by Claude, they show that knowledge distillation can advance the state-of-the-art in Yoruba-English translation, meeting or surpassing strong baselines like NLLB-54B and Google Translate. The paper contributes to the field by providing evidence of Claude's superior performance in low-resource translation and by showing how LLMs can be leveraged to improve NMT models.The paper "From LLM to NMT: Advancing Low-Resource Machine Translation with Claude" by Maxim Enis and Mark Hopkins from Williams College explores the capabilities of Claude 3 Opus, a large language model (LLM) released by Anthropic in March 2024, in machine translation. The authors find that Claude 3 Opus exhibits stronger machine translation competence compared to other LLMs, particularly in low-resource settings. They address the issue of data contamination on the FLORES-200 benchmark but curate new benchmarks to validate Claude's effectiveness. The study highlights Claude's *resource efficiency*, meaning its performance depends less on the resource level of the language pair. Additionally, the authors demonstrate that advancements in LLM translation can be compressed into traditional neural machine translation (NMT) models through knowledge distillation. Using synthetic data generated by Claude, they show that knowledge distillation can advance the state-of-the-art in Yoruba-English translation, meeting or surpassing strong baselines like NLLB-54B and Google Translate. The paper contributes to the field by providing evidence of Claude's superior performance in low-resource translation and by showing how LLMs can be leveraged to improve NMT models.

From LLM to NMT: Advancing Low-Resource Machine Translation with Claude

22 Apr 2024 | Maxim Enis and Mark Hopkins