Adapting Large Language Models for Document-Level Machine Translation

Adapting Large Language Models for Document-Level Machine Translation

9 Jun 2024 | Minghao Wu, Thuy-Trang Vu, Lizhen Qu, George Foster, Gholamreza Haffari
This study investigates the adaptation of large language models (LLMs) for document-level machine translation (DocMT) across nine language pairs. The research focuses on two fine-tuning methods, three LLM backbones, and 18 translation tasks. Key findings include: 1. **Performance**: While specialized models can sometimes outperform GPT-4 in certain translation tasks, they still face issues such as off-target translations due to error propagation in decoding. 2. **Fine-Tuning Strategies**: Parameter-efficient fine-tuning (PEFT) generally outperforms full fine-tuning (FFT) but requires less data. FFT is more data-efficient, needing only about 1% of the total dataset. 3. **Evaluation on Recent Test Sets**: LLM-based DocMT models generalize better on out-of-domain text compared to conventional DocMT models when data leakage risks are mitigated. 4. **Advantages of Base LLMs**: Base LLMs, when used as backbones for task-specific supervised fine-tuning, perform better than instruction-tuned LLMs and exhibit more effective zero-shot cross-lingual transfer. The study also provides an in-depth analysis of translation errors, discourse phenomena, training strategies, the scaling law of parallel documents, and zero-shot cross-lingual transfer, highlighting both the strengths and limitations of LLM-based DocMT models.This study investigates the adaptation of large language models (LLMs) for document-level machine translation (DocMT) across nine language pairs. The research focuses on two fine-tuning methods, three LLM backbones, and 18 translation tasks. Key findings include: 1. **Performance**: While specialized models can sometimes outperform GPT-4 in certain translation tasks, they still face issues such as off-target translations due to error propagation in decoding. 2. **Fine-Tuning Strategies**: Parameter-efficient fine-tuning (PEFT) generally outperforms full fine-tuning (FFT) but requires less data. FFT is more data-efficient, needing only about 1% of the total dataset. 3. **Evaluation on Recent Test Sets**: LLM-based DocMT models generalize better on out-of-domain text compared to conventional DocMT models when data leakage risks are mitigated. 4. **Advantages of Base LLMs**: Base LLMs, when used as backbones for task-specific supervised fine-tuning, perform better than instruction-tuned LLMs and exhibit more effective zero-shot cross-lingual transfer. The study also provides an in-depth analysis of translation errors, discourse phenomena, training strategies, the scaling law of parallel documents, and zero-shot cross-lingual transfer, highlighting both the strengths and limitations of LLM-based DocMT models.
Reach us at info@study.space
Understanding Adapting Large Language Models for Document-Level Machine Translation