LANGBRIDGE: Multilingual Reasoning Without Multilingual Supervision

LANGBRIDGE: Multilingual Reasoning Without Multilingual Supervision

3 Jun 2024 | Dongkeun Yoon1 Joel Jang2 Sungdong Kim1,3 Seungone Kim1,4 Sheikh Shafayat1 Minjoon Seo1
**LANGBRIDGE: Multilingual Reasoning Without Multilingual Supervision** **Authors:** Dongkeun Yoon, Joel Jang, Sungdong Kim, Seungone Kim, Sheikh Shafayat, Minjoon Seo **Institution:** KAIST, University of Washington, NAVER AI Lab, Carnegie Mellon University **Abstract:** LANGBRIDGE is a zero-shot approach to adapt language models for multilingual reasoning tasks without multilingual supervision. It bridges two specialized models: one for understanding multiple languages (e.g., mT5 encoder) and another for reasoning (e.g., Orca 2). LANGBRIDGE connects these models with minimal trainable parameters, enhancing performance on low-resource languages in mathematical reasoning, code completion, logical reasoning, and commonsense reasoning. Despite using only English data for training, LANGBRIDGE significantly improves multilingual reasoning capabilities, attributed to the language-agnostic nature of multilingual representations. **Introduction:** Language models (LMs) often struggle with low-resource languages due to limited training data. Previous methods adapt English-centric LMs to other languages through continual training, but this approach is challenging for a large number of languages. LANGBRIDGE leverages the multimodal literature, integrating independently pretrained modalities, to align a multilingual encoder with an LM using minimal English data. This method enhances multilingual reasoning without multilingual supervision. **Related Work:** - **English-centric Language Models:** These models are often adapted from English-centric LMs, inheriting limited proficiency in low-resource languages. - **Zero-shot Cross-lingual Transfer:** Multilingual models can handle tasks across multiple languages after being finetuned on high-resource languages. - **Aligning Pretrained Representations:** Combining independently pretrained representations has been explored in cross-modal alignment, but LANGBRIDGE focuses on aligning a multilingual encoder with an LM. **LANGBRIDGE:** - **Hypothesis:** LANGBRIDGE aligns a multilingual encoder with an LM to enable the LM to understand the semantics of supported languages without multilingual supervision. - **Model Architecture:** LANGBRIDGE maps the final hidden states of multilingual encoders to the soft prompts of LMs, using a single linear layer and an additional trainable token. **Experiments:** - **Mathematical Reasoning:** LANGBRIDGE enhances LMs' performance on low-resource languages, especially in underrepresented languages. - **Code Completion:** LANGBRIDGE models show consistent improvements over existing models across all underrepresented languages. - **Logical Reasoning:** LANGBRIDGE models perform well on tasks requiring intrinsic linguistic understanding, demonstrating robustness in grasping nuanced linguistic details. **Analysis:** - **PCA:** LANGBRIDGE models exhibit language-agnostic output representations, mapping all languages into a single cluster. - **Accidental Translations:** Some LANGBRIDGE models show accidental translations, indicating that multiple languages may have similar representations. **Conclusion:** LANGBRIDGE effectively extends the**LANGBRIDGE: Multilingual Reasoning Without Multilingual Supervision** **Authors:** Dongkeun Yoon, Joel Jang, Sungdong Kim, Seungone Kim, Sheikh Shafayat, Minjoon Seo **Institution:** KAIST, University of Washington, NAVER AI Lab, Carnegie Mellon University **Abstract:** LANGBRIDGE is a zero-shot approach to adapt language models for multilingual reasoning tasks without multilingual supervision. It bridges two specialized models: one for understanding multiple languages (e.g., mT5 encoder) and another for reasoning (e.g., Orca 2). LANGBRIDGE connects these models with minimal trainable parameters, enhancing performance on low-resource languages in mathematical reasoning, code completion, logical reasoning, and commonsense reasoning. Despite using only English data for training, LANGBRIDGE significantly improves multilingual reasoning capabilities, attributed to the language-agnostic nature of multilingual representations. **Introduction:** Language models (LMs) often struggle with low-resource languages due to limited training data. Previous methods adapt English-centric LMs to other languages through continual training, but this approach is challenging for a large number of languages. LANGBRIDGE leverages the multimodal literature, integrating independently pretrained modalities, to align a multilingual encoder with an LM using minimal English data. This method enhances multilingual reasoning without multilingual supervision. **Related Work:** - **English-centric Language Models:** These models are often adapted from English-centric LMs, inheriting limited proficiency in low-resource languages. - **Zero-shot Cross-lingual Transfer:** Multilingual models can handle tasks across multiple languages after being finetuned on high-resource languages. - **Aligning Pretrained Representations:** Combining independently pretrained representations has been explored in cross-modal alignment, but LANGBRIDGE focuses on aligning a multilingual encoder with an LM. **LANGBRIDGE:** - **Hypothesis:** LANGBRIDGE aligns a multilingual encoder with an LM to enable the LM to understand the semantics of supported languages without multilingual supervision. - **Model Architecture:** LANGBRIDGE maps the final hidden states of multilingual encoders to the soft prompts of LMs, using a single linear layer and an additional trainable token. **Experiments:** - **Mathematical Reasoning:** LANGBRIDGE enhances LMs' performance on low-resource languages, especially in underrepresented languages. - **Code Completion:** LANGBRIDGE models show consistent improvements over existing models across all underrepresented languages. - **Logical Reasoning:** LANGBRIDGE models perform well on tasks requiring intrinsic linguistic understanding, demonstrating robustness in grasping nuanced linguistic details. **Analysis:** - **PCA:** LANGBRIDGE models exhibit language-agnostic output representations, mapping all languages into a single cluster. - **Accidental Translations:** Some LANGBRIDGE models show accidental translations, indicating that multiple languages may have similar representations. **Conclusion:** LANGBRIDGE effectively extends the
Reach us at info@study.space
[slides] LangBridge%3A Multilingual Reasoning Without Multilingual Supervision | StudySpace