4 Jun 2019 | Telmo Pires, Eva Schlinger, Dan Garrette
Multilingual BERT (M-BERT) is a pre-trained language model that has shown surprising ability in zero-shot cross-lingual transfer. This paper investigates how well M-BERT generalizes across languages and explores the nature of its multilingual representations. The model is trained on monolingual Wikipedia corpora from 104 languages and uses a shared vocabulary. It is able to transfer knowledge between languages with different scripts and typologically similar languages, indicating that it captures multilingual representations. However, it also exhibits systematic deficiencies in certain language pairs.
The paper presents several experiments showing that M-BERT can perform well in zero-shot transfer, even when there is no lexical overlap between languages. For example, it achieves high accuracy in named entity recognition (NER) and part-of-speech (POS) tagging across multiple languages, including those written in different scripts. However, its performance is less accurate for languages with different word orders, suggesting that it does not learn systematic transformations of linguistic structures to accommodate different target languages.
The paper also explores whether M-BERT can generalize from monolingual inputs to code-switching text and transliterated text. While it performs reasonably well on code-switching data, it struggles with transliterated text, indicating that it may rely on language-specific pre-training for effective transfer. The results suggest that M-BERT's multilingual representation is deeper than simple vocabulary memorization, but it still has limitations in certain cases.
The paper concludes that M-BERT's ability to generalize cross-lingually is underpinned by a multilingual representation, but that this representation is not sufficient for all types of cross-lingual transfer. The study highlights the importance of exploring multilingual training objectives to improve cross-lingual transfer performance.Multilingual BERT (M-BERT) is a pre-trained language model that has shown surprising ability in zero-shot cross-lingual transfer. This paper investigates how well M-BERT generalizes across languages and explores the nature of its multilingual representations. The model is trained on monolingual Wikipedia corpora from 104 languages and uses a shared vocabulary. It is able to transfer knowledge between languages with different scripts and typologically similar languages, indicating that it captures multilingual representations. However, it also exhibits systematic deficiencies in certain language pairs.
The paper presents several experiments showing that M-BERT can perform well in zero-shot transfer, even when there is no lexical overlap between languages. For example, it achieves high accuracy in named entity recognition (NER) and part-of-speech (POS) tagging across multiple languages, including those written in different scripts. However, its performance is less accurate for languages with different word orders, suggesting that it does not learn systematic transformations of linguistic structures to accommodate different target languages.
The paper also explores whether M-BERT can generalize from monolingual inputs to code-switching text and transliterated text. While it performs reasonably well on code-switching data, it struggles with transliterated text, indicating that it may rely on language-specific pre-training for effective transfer. The results suggest that M-BERT's multilingual representation is deeper than simple vocabulary memorization, but it still has limitations in certain cases.
The paper concludes that M-BERT's ability to generalize cross-lingually is underpinned by a multilingual representation, but that this representation is not sufficient for all types of cross-lingual transfer. The study highlights the importance of exploring multilingual training objectives to improve cross-lingual transfer performance.