[slides and audio] How Multilingual is Multilingual BERT%3F

This paper investigates the cross-lingual transfer capabilities of Multilingual BERT (M-BERT), a pre-trained model that supports 104 languages. The authors conduct a series of experiments to explore how M-BERT generalizes across different languages, including those written in different scripts and with varying typological features. Key findings include: 1. **Zero-Shot Cross-Lingual Transfer**: M-BERT performs well in zero-shot cross-lingual model transfer, achieving high accuracy in tasks like Named Entity Recognition (NER) and Part-of-Speech (POS) tagging across multiple languages. 2. **Vocabulary Memorization**: M-BERT's performance is not heavily dependent on lexical overlap, indicating that it learns deeper multilingual representations beyond simple vocabulary memorization. 3. **Cross-Script Transfer**: M-BERT can transfer well between languages written in different scripts, suggesting it captures multilingual representations that can map structures onto new vocabularies. 4. **Typological Similarity**: Transfer is more effective between typologically similar languages, indicating that M-BERT's multilingual representation is better at mapping learned structures onto new vocabularies when the languages share similar grammatical features. 5. **Code-Switching and Transliteration**: M-BERT handles code-switching and transliterated text reasonably well, but struggles with transliterated targets, suggesting that explicit multilingual training objectives may be necessary for better performance in these scenarios. 6. **Multilingual Characterization of the Feature Space**: Experiments show that M-BERT's hidden representations form a shared subspace across languages, indicating a language-agnostic representation that captures useful linguistic information. The authors conclude that M-BERT's robust cross-lingual generalization is underpinned by a multilingual representation, but further improvements may require explicit multilingual training objectives.This paper investigates the cross-lingual transfer capabilities of Multilingual BERT (M-BERT), a pre-trained model that supports 104 languages. The authors conduct a series of experiments to explore how M-BERT generalizes across different languages, including those written in different scripts and with varying typological features. Key findings include: 1. **Zero-Shot Cross-Lingual Transfer**: M-BERT performs well in zero-shot cross-lingual model transfer, achieving high accuracy in tasks like Named Entity Recognition (NER) and Part-of-Speech (POS) tagging across multiple languages. 2. **Vocabulary Memorization**: M-BERT's performance is not heavily dependent on lexical overlap, indicating that it learns deeper multilingual representations beyond simple vocabulary memorization. 3. **Cross-Script Transfer**: M-BERT can transfer well between languages written in different scripts, suggesting it captures multilingual representations that can map structures onto new vocabularies. 4. **Typological Similarity**: Transfer is more effective between typologically similar languages, indicating that M-BERT's multilingual representation is better at mapping learned structures onto new vocabularies when the languages share similar grammatical features. 5. **Code-Switching and Transliteration**: M-BERT handles code-switching and transliterated text reasonably well, but struggles with transliterated targets, suggesting that explicit multilingual training objectives may be necessary for better performance in these scenarios. 6. **Multilingual Characterization of the Feature Space**: Experiments show that M-BERT's hidden representations form a shared subspace across languages, indicating a language-agnostic representation that captures useful linguistic information. The authors conclude that M-BERT's robust cross-lingual generalization is underpinned by a multilingual representation, but further improvements may require explicit multilingual training objectives.

How multilingual is Multilingual BERT?

4 Jun 2019 | Telmo Pires, Eva Schlinger, Dan Garrette