27 Feb 2024 | Duarte M. Alves, José Pombal, Nuno M. Guerreiro, Pedro H. Martins, João Alves, Amin Farajian, Ben Peters, Ricardo Rei, Patrick Fernandes, Sweta Agrawal, Pierre Colombo, José G.C. de Souza, André F.T. Martins
The paper introduces TOWER, an open multilingual large language model (LLM) designed for translation-related tasks. The authors propose a method to tailor LLMs for multiple translation workflows by extending the multilingual capabilities of LLaMA-2 through continued pretraining on a mixture of monolingual and parallel data, creating TOWERBASE. They then fine-tune this model on a dataset of high-quality and diverse instructions for translation-related tasks, resulting in TOWERINSTRUCT. TOWERINSTRUCT consistently outperforms open alternatives on various translation tasks and is competitive with closed-source models like GPT-4. The paper also includes a detailed evaluation of the model's performance on translation quality, automatic post-edition, named entity recognition, and grammatical error correction. The authors release the TOWER models, the specialization dataset TOWERBLOCKS, an evaluation framework TOWEREVAL, and a collection of model generations to facilitate future research.The paper introduces TOWER, an open multilingual large language model (LLM) designed for translation-related tasks. The authors propose a method to tailor LLMs for multiple translation workflows by extending the multilingual capabilities of LLaMA-2 through continued pretraining on a mixture of monolingual and parallel data, creating TOWERBASE. They then fine-tune this model on a dataset of high-quality and diverse instructions for translation-related tasks, resulting in TOWERINSTRUCT. TOWERINSTRUCT consistently outperforms open alternatives on various translation tasks and is competitive with closed-source models like GPT-4. The paper also includes a detailed evaluation of the model's performance on translation quality, automatic post-edition, named entity recognition, and grammatical error correction. The authors release the TOWER models, the specialization dataset TOWERBLOCKS, an evaluation framework TOWEREVAL, and a collection of model generations to facilitate future research.