11 Mar 2021 | Mihir Kale, Linting Xue*, Noah Constant*, Adam Roberts*, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel
The paper introduces mT5, a multilingual variant of the Text-to-Text Transfer Transformer (T5) model, which is pre-trained on a Common Crawl-based dataset covering 101 languages. mT5 is designed to address the limitations of existing multilingual models by following the same recipe as T5, including its text-to-text format, design principles, and scale. The authors detail the modifications made to mT5, such as the introduction of the mC4 dataset, and demonstrate its state-of-the-art performance on various multilingual benchmarks. They also describe a technique to prevent "accidental translation" in zero-shot settings, where the model incorrectly translates predictions into the wrong language. The code and model checkpoints are publicly available. The paper highlights the importance of model capacity in cross-lingual representation learning and suggests that scaling up a simple pre-training recipe can be a viable alternative to more complex techniques.The paper introduces mT5, a multilingual variant of the Text-to-Text Transfer Transformer (T5) model, which is pre-trained on a Common Crawl-based dataset covering 101 languages. mT5 is designed to address the limitations of existing multilingual models by following the same recipe as T5, including its text-to-text format, design principles, and scale. The authors detail the modifications made to mT5, such as the introduction of the mC4 dataset, and demonstrate its state-of-the-art performance on various multilingual benchmarks. They also describe a technique to prevent "accidental translation" in zero-shot settings, where the model incorrectly translates predictions into the wrong language. The code and model checkpoints are publicly available. The paper highlights the importance of model capacity in cross-lingual representation learning and suggests that scaling up a simple pre-training recipe can be a viable alternative to more complex techniques.