5 Jun 2024 | Yiyi Chen, Heather Lent, Johannes Bjerva
This paper explores the security of multilingual language models (LLMs) through *multilingual* embedding inversion attacks, a novel extension of existing monolingual English-focused attacks. The authors define and investigate black-box multilingual and cross-lingual inversion attacks, finding that multilingual LLMs are more vulnerable to such attacks compared to monolingual models. This vulnerability is partly due to the ineffectiveness of monolingual defenses in protecting multilingual models. To address this, the authors propose a simple masking defense that is effective for both monolingual and multilingual models, without requiring additional model training. The study is the first to systematically investigate multilingual inversion attacks, highlighting the differences in attack and defense strategies across monolingual and multilingual settings. The research underscores the need for a multilingual approach to LLM security and calls for further exploration of embedding inversion attacks in a broader range of languages.This paper explores the security of multilingual language models (LLMs) through *multilingual* embedding inversion attacks, a novel extension of existing monolingual English-focused attacks. The authors define and investigate black-box multilingual and cross-lingual inversion attacks, finding that multilingual LLMs are more vulnerable to such attacks compared to monolingual models. This vulnerability is partly due to the ineffectiveness of monolingual defenses in protecting multilingual models. To address this, the authors propose a simple masking defense that is effective for both monolingual and multilingual models, without requiring additional model training. The study is the first to systematically investigate multilingual inversion attacks, highlighting the differences in attack and defense strategies across monolingual and multilingual settings. The research underscores the need for a multilingual approach to LLM security and calls for further exploration of embedding inversion attacks in a broader range of languages.