7 Feb 2024 | Roman Koshkin†‡ Katsuhito Sudoh† Satoshi Nakamura†
The paper "TRANSLLAMA: LLM-based Simultaneous Translation System" by Roman Koshkin, Katsuhito Sudoh, and Satoshi Nakamura explores the use of decoder-only large language models (LLMs) for simultaneous machine translation (SiMT). The authors demonstrate that pre-trained LLMs can be fine-tuned on causally aligned source-target sentence pairs to control input segmentation directly by generating a special "wait" token, eliminating the need for a separate policy. This approach enables the LLMs to perform English-German and English-Russian SiMT tasks with BLEU scores comparable to state-of-the-art baselines. The study also evaluates closed-source models like GPT-4, showing encouraging results in zero-shot performance, indicating potential for enhancing future SiMT systems. The main contributions include a method to fine-tune LLMs for SiMT tasks and demonstrating that LLMs can perform both simultaneous translation and input segmentation without a separate policy, achieving performance comparable to or exceeding state-of-the-art baselines. The paper discusses the architecture, fine-tuning data preparation, and training procedures, and evaluates the system's performance on English-to-German and English-to-Russian language pairs.The paper "TRANSLLAMA: LLM-based Simultaneous Translation System" by Roman Koshkin, Katsuhito Sudoh, and Satoshi Nakamura explores the use of decoder-only large language models (LLMs) for simultaneous machine translation (SiMT). The authors demonstrate that pre-trained LLMs can be fine-tuned on causally aligned source-target sentence pairs to control input segmentation directly by generating a special "wait" token, eliminating the need for a separate policy. This approach enables the LLMs to perform English-German and English-Russian SiMT tasks with BLEU scores comparable to state-of-the-art baselines. The study also evaluates closed-source models like GPT-4, showing encouraging results in zero-shot performance, indicating potential for enhancing future SiMT systems. The main contributions include a method to fine-tune LLMs for SiMT tasks and demonstrating that LLMs can perform both simultaneous translation and input segmentation without a separate policy, achieving performance comparable to or exceeding state-of-the-art baselines. The paper discusses the architecture, fine-tuning data preparation, and training procedures, and evaluates the system's performance on English-to-German and English-to-Russian language pairs.