Mixture-of-Subspaces in Low-Rank Adaptation

Mixture-of-Subspaces in Low-Rank Adaptation

5 Jul 2024 | Taiqiang Wu, Jiahao Wang, Zhe Zhao, Ngai Wong
This paper introduces a subspace-inspired Low-Rank Adaptation (LoRA) method called Mixture-of-Subspaces LoRA (MoSLoRA). LoRA, a parameter-efficient fine-tuning technique, updates the weights of large language models (LLMs) using low-rank matrices. The authors decompose LoRA into subspaces and find that mixing these subspaces enhances performance. They further analyze the two-subspace mixing strategy in a fine-grained subspace view, showing that it is equivalent to using a fixed mixer matrix. MoSLoRA employs a learnable mixer to fuse more subspaces, improving flexibility and performance. Experiments on various tasks, including commonsense reasoning, visual instruction tuning, and subject-driven text-to-image generation, demonstrate that MoSLoRA consistently outperforms LoRA and other baselines. The method is computationally efficient, easy to implement, and applicable to large language, multimodal, and diffusion models.This paper introduces a subspace-inspired Low-Rank Adaptation (LoRA) method called Mixture-of-Subspaces LoRA (MoSLoRA). LoRA, a parameter-efficient fine-tuning technique, updates the weights of large language models (LLMs) using low-rank matrices. The authors decompose LoRA into subspaces and find that mixing these subspaces enhances performance. They further analyze the two-subspace mixing strategy in a fine-grained subspace view, showing that it is equivalent to using a fixed mixer matrix. MoSLoRA employs a learnable mixer to fuse more subspaces, improving flexibility and performance. Experiments on various tasks, including commonsense reasoning, visual instruction tuning, and subject-driven text-to-image generation, demonstrate that MoSLoRA consistently outperforms LoRA and other baselines. The method is computationally efficient, easy to implement, and applicable to large language, multimodal, and diffusion models.
Reach us at info@study.space
[slides and audio] Mixture-of-Subspaces in Low-Rank Adaptation