DoRA: Weight-Decomposed Low-Rank Adaptation
DoRA is a novel parameter-efficient fine-tuning (PEFT) method that incorporates weight decomposition to achieve a learning capacity closely resembling full fine-tuning (FT) without additional inference latency. The method decomposes pre-trained weights into magnitude and direction components, with LoRA used for directional adaptation to enable efficient fine-tuning. DoRA outperforms LoRA on various tasks, including NLP, vision-language, and across different model backbones, such as LLM and LVLM. It consistently surpasses LoRA in performance on tasks like commonsense reasoning, visual instruction tuning, and image/video-text understanding. DoRA also shows compatibility with other LoRA variants, such as VeRA, and can be integrated with QLoRA to further reduce memory usage. The method is validated through extensive experiments, demonstrating its effectiveness and efficiency in fine-tuning large language models. DoRA's approach of decomposing weights into magnitude and direction components allows for more nuanced learning patterns, closely resembling FT's behavior. The method is also shown to be effective in text-to-image generation and has potential for broader applications beyond language and vision.DoRA: Weight-Decomposed Low-Rank Adaptation
DoRA is a novel parameter-efficient fine-tuning (PEFT) method that incorporates weight decomposition to achieve a learning capacity closely resembling full fine-tuning (FT) without additional inference latency. The method decomposes pre-trained weights into magnitude and direction components, with LoRA used for directional adaptation to enable efficient fine-tuning. DoRA outperforms LoRA on various tasks, including NLP, vision-language, and across different model backbones, such as LLM and LVLM. It consistently surpasses LoRA in performance on tasks like commonsense reasoning, visual instruction tuning, and image/video-text understanding. DoRA also shows compatibility with other LoRA variants, such as VeRA, and can be integrated with QLoRA to further reduce memory usage. The method is validated through extensive experiments, demonstrating its effectiveness and efficiency in fine-tuning large language models. DoRA's approach of decomposing weights into magnitude and direction components allows for more nuanced learning patterns, closely resembling FT's behavior. The method is also shown to be effective in text-to-image generation and has potential for broader applications beyond language and vision.