[slides and audio] Tower%3A An Open Multilingual Large Language Model for Translation-Related Tasks

The paper introduces TOWER, an open multilingual large language model (LLM) designed for translation-related tasks. The model is built by first continuing the pretraining of LLaMA-2 on a multilingual dataset containing 20 billion tokens, creating TOWERBASE. This is followed by supervised finetuning on a dataset of translation-related tasks, TOWERBLOCKS, to create TOWERINSTRUCT. TOWERINSTRUCT outperforms open alternatives on several translation-related tasks and is competitive with general-purpose closed LLMs like GPT-4 and GPT-3.5-turbo. The paper also releases the TOWER models, TOWERBLOCKS dataset, an evaluation framework for LLMs focused on translation, and a collection of model generations for benchmarking. The TOWERBASE model extends the multilingual capabilities of LLaMA-2 by incorporating both monolingual and parallel data during pretraining, which improves translation quality. TOWERBLOCKS is a dataset curated to specialize LLMs for translation-related tasks, including tasks like automatic post-edition, grammatical error correction, and named entity recognition. TOWERINSTRUCT is then finetuned on TOWERBLOCKS to create an instruction-following model tailored for translation tasks. The paper evaluates TOWERINSTRUCT on various translation-related tasks, including machine translation, automatic post-edition, grammatical error correction, and named entity recognition. TOWERINSTRUCT consistently achieves higher translation quality than open alternatives and is competitive with closed models. The results show that TOWERINSTRUCT outperforms open models in automatic post-edition, grammatical error correction, and named entity recognition. The paper also highlights the importance of including parallel data during pretraining and the effectiveness of including conversational and coding data in TOWERBLOCKS. The paper also discusses the impact of different design choices on the performance of the TOWER models, including the effects of continued pretraining and supervised finetuning. The results show that continued pretraining and supervised finetuning both contribute to performance improvements. The paper also explores the role of parallel data during pretraining in improving translation quality and the transfer/interference relations between tasks. The findings suggest that the inclusion of parallel data during pretraining is sample efficient but continues to improve translation quality with more tokens. The paper concludes that TOWER is a promising open multilingual LLM for translation-related tasks, outperforming open alternatives and being competitive with closed models.The paper introduces TOWER, an open multilingual large language model (LLM) designed for translation-related tasks. The model is built by first continuing the pretraining of LLaMA-2 on a multilingual dataset containing 20 billion tokens, creating TOWERBASE. This is followed by supervised finetuning on a dataset of translation-related tasks, TOWERBLOCKS, to create TOWERINSTRUCT. TOWERINSTRUCT outperforms open alternatives on several translation-related tasks and is competitive with general-purpose closed LLMs like GPT-4 and GPT-3.5-turbo. The paper also releases the TOWER models, TOWERBLOCKS dataset, an evaluation framework for LLMs focused on translation, and a collection of model generations for benchmarking. The TOWERBASE model extends the multilingual capabilities of LLaMA-2 by incorporating both monolingual and parallel data during pretraining, which improves translation quality. TOWERBLOCKS is a dataset curated to specialize LLMs for translation-related tasks, including tasks like automatic post-edition, grammatical error correction, and named entity recognition. TOWERINSTRUCT is then finetuned on TOWERBLOCKS to create an instruction-following model tailored for translation tasks. The paper evaluates TOWERINSTRUCT on various translation-related tasks, including machine translation, automatic post-edition, grammatical error correction, and named entity recognition. TOWERINSTRUCT consistently achieves higher translation quality than open alternatives and is competitive with closed models. The results show that TOWERINSTRUCT outperforms open models in automatic post-edition, grammatical error correction, and named entity recognition. The paper also highlights the importance of including parallel data during pretraining and the effectiveness of including conversational and coding data in TOWERBLOCKS. The paper also discusses the impact of different design choices on the performance of the TOWER models, including the effects of continued pretraining and supervised finetuning. The results show that continued pretraining and supervised finetuning both contribute to performance improvements. The paper also explores the role of parallel data during pretraining in improving translation quality and the transfer/interference relations between tasks. The findings suggest that the inclusion of parallel data during pretraining is sample efficient but continues to improve translation quality with more tokens. The paper concludes that TOWER is a promising open multilingual LLM for translation-related tasks, outperforming open alternatives and being competitive with closed models.

TOWER: An Open Multilingual Large Language Model for Translation-Related Tasks

27 Feb 2024 | Duarte M. Alves, José Pombal, Nuno M. Guerreiro, Pedro H. Martins, João Alves, Amin Farajian, Ben Peters, Ricardo Rei, Patrick Fernandes, Sweta Agrawal, Pierre Colombo, José G.C. de Souza, André F.T. Martins