3 Jun 2024 | Dongkeun Yoon1 Joel Jang2 Sungdong Kim1,3 Seungone Kim1,4 Sheikh Shafayat1 Minjoon Seo1
**LANGBRIDGE: Multilingual Reasoning Without Multilingual Supervision**
**Authors:** Dongkeun Yoon, Joel Jang, Sungdong Kim, Seungone Kim, Sheikh Shafayat, Minjoon Seo
**Institution:** KAIST, University of Washington, NAVER AI Lab, Carnegie Mellon University
**Abstract:**
LANGBRIDGE is a zero-shot approach to adapt language models for multilingual reasoning tasks without multilingual supervision. It bridges two specialized models: one for understanding multiple languages (e.g., mT5 encoder) and another for reasoning (e.g., Orca 2). LANGBRIDGE connects these models with minimal trainable parameters, enhancing performance on low-resource languages in mathematical reasoning, code completion, logical reasoning, and commonsense reasoning. Despite using only English data for training, LANGBRIDGE significantly improves multilingual reasoning capabilities, attributed to the language-agnostic nature of multilingual representations.
**Introduction:**
Language models (LMs) often struggle with low-resource languages due to limited training data. Previous methods adapt English-centric LMs to other languages through continual training, but this approach is challenging for a large number of languages. LANGBRIDGE leverages the multimodal literature, integrating independently pretrained modalities, to align a multilingual encoder with an LM using minimal English data. This method enhances multilingual reasoning without multilingual supervision.
**Related Work:**
- **English-centric Language Models:** These models are often adapted from English-centric LMs, inheriting limited proficiency in low-resource languages.
- **Zero-shot Cross-lingual Transfer:** Multilingual models can handle tasks across multiple languages after being finetuned on high-resource languages.
- **Aligning Pretrained Representations:** Combining independently pretrained representations has been explored in cross-modal alignment, but LANGBRIDGE focuses on aligning a multilingual encoder with an LM.
**LANGBRIDGE:**
- **Hypothesis:** LANGBRIDGE aligns a multilingual encoder with an LM to enable the LM to understand the semantics of supported languages without multilingual supervision.
- **Model Architecture:** LANGBRIDGE maps the final hidden states of multilingual encoders to the soft prompts of LMs, using a single linear layer and an additional trainable token.
**Experiments:**
- **Mathematical Reasoning:** LANGBRIDGE enhances LMs' performance on low-resource languages, especially in underrepresented languages.
- **Code Completion:** LANGBRIDGE models show consistent improvements over existing models across all underrepresented languages.
- **Logical Reasoning:** LANGBRIDGE models perform well on tasks requiring intrinsic linguistic understanding, demonstrating robustness in grasping nuanced linguistic details.
**Analysis:**
- **PCA:** LANGBRIDGE models exhibit language-agnostic output representations, mapping all languages into a single cluster.
- **Accidental Translations:** Some LANGBRIDGE models show accidental translations, indicating that multiple languages may have similar representations.
**Conclusion:**
LANGBRIDGE effectively extends the**LANGBRIDGE: Multilingual Reasoning Without Multilingual Supervision**
**Authors:** Dongkeun Yoon, Joel Jang, Sungdong Kim, Seungone Kim, Sheikh Shafayat, Minjoon Seo
**Institution:** KAIST, University of Washington, NAVER AI Lab, Carnegie Mellon University
**Abstract:**
LANGBRIDGE is a zero-shot approach to adapt language models for multilingual reasoning tasks without multilingual supervision. It bridges two specialized models: one for understanding multiple languages (e.g., mT5 encoder) and another for reasoning (e.g., Orca 2). LANGBRIDGE connects these models with minimal trainable parameters, enhancing performance on low-resource languages in mathematical reasoning, code completion, logical reasoning, and commonsense reasoning. Despite using only English data for training, LANGBRIDGE significantly improves multilingual reasoning capabilities, attributed to the language-agnostic nature of multilingual representations.
**Introduction:**
Language models (LMs) often struggle with low-resource languages due to limited training data. Previous methods adapt English-centric LMs to other languages through continual training, but this approach is challenging for a large number of languages. LANGBRIDGE leverages the multimodal literature, integrating independently pretrained modalities, to align a multilingual encoder with an LM using minimal English data. This method enhances multilingual reasoning without multilingual supervision.
**Related Work:**
- **English-centric Language Models:** These models are often adapted from English-centric LMs, inheriting limited proficiency in low-resource languages.
- **Zero-shot Cross-lingual Transfer:** Multilingual models can handle tasks across multiple languages after being finetuned on high-resource languages.
- **Aligning Pretrained Representations:** Combining independently pretrained representations has been explored in cross-modal alignment, but LANGBRIDGE focuses on aligning a multilingual encoder with an LM.
**LANGBRIDGE:**
- **Hypothesis:** LANGBRIDGE aligns a multilingual encoder with an LM to enable the LM to understand the semantics of supported languages without multilingual supervision.
- **Model Architecture:** LANGBRIDGE maps the final hidden states of multilingual encoders to the soft prompts of LMs, using a single linear layer and an additional trainable token.
**Experiments:**
- **Mathematical Reasoning:** LANGBRIDGE enhances LMs' performance on low-resource languages, especially in underrepresented languages.
- **Code Completion:** LANGBRIDGE models show consistent improvements over existing models across all underrepresented languages.
- **Logical Reasoning:** LANGBRIDGE models perform well on tasks requiring intrinsic linguistic understanding, demonstrating robustness in grasping nuanced linguistic details.
**Analysis:**
- **PCA:** LANGBRIDGE models exhibit language-agnostic output representations, mapping all languages into a single cluster.
- **Accidental Translations:** Some LANGBRIDGE models show accidental translations, indicating that multiple languages may have similar representations.
**Conclusion:**
LANGBRIDGE effectively extends the