Computational Limits of Low-Rank Adaptation (LoRA) for Transformer-Based Models

Computational Limits of Low-Rank Adaptation (LoRA) for Transformer-Based Models

June 5, 2024 | Jerry Yao-Chieh Hu, Maojiang Su, En-Jui Kuo, Zhao Song, Han Liu
This paper investigates the computational limits of Low-Rank Adaptation (LoRA) for fine-tuning transformer-based models using fine-grained complexity theory. The authors identify a phase transition behavior in the efficiency of LoRA adaptation, showing that efficient (sub-quadratic) approximation algorithms exist only below a certain norm threshold. They also prove the existence of nearly linear time algorithms for LoRA adaptation by leveraging the hierarchical low-rank structures of LoRA gradients and approximating gradients with chained low-rank approximations. The study considers two practical scenarios: partial adaptation (e.g., only $ W_{V} $ and $ W_{Q} $) and full adaptation (e.g., $ W_{Q} $, $ W_{V} $, and $ W_{K} $) of attention weights. The results show that the efficiency of LoRA adaptation depends on the norms of input, pretrained, and adaptor weights, with an inefficiency threshold identified under the Strong Exponential Time Hypothesis (SETH). The authors demonstrate that precise gradient computations for LoRA are computationally expensive, but nearly linear time approximations are achievable under certain conditions. The paper provides theoretical insights into the computational complexity of LoRA and its implications for efficient fine-tuning of large transformer-based models.This paper investigates the computational limits of Low-Rank Adaptation (LoRA) for fine-tuning transformer-based models using fine-grained complexity theory. The authors identify a phase transition behavior in the efficiency of LoRA adaptation, showing that efficient (sub-quadratic) approximation algorithms exist only below a certain norm threshold. They also prove the existence of nearly linear time algorithms for LoRA adaptation by leveraging the hierarchical low-rank structures of LoRA gradients and approximating gradients with chained low-rank approximations. The study considers two practical scenarios: partial adaptation (e.g., only $ W_{V} $ and $ W_{Q} $) and full adaptation (e.g., $ W_{Q} $, $ W_{V} $, and $ W_{K} $) of attention weights. The results show that the efficiency of LoRA adaptation depends on the norms of input, pretrained, and adaptor weights, with an inefficiency threshold identified under the Strong Exponential Time Hypothesis (SETH). The authors demonstrate that precise gradient computations for LoRA are computationally expensive, but nearly linear time approximations are achievable under certain conditions. The paper provides theoretical insights into the computational complexity of LoRA and its implications for efficient fine-tuning of large transformer-based models.
Reach us at info@study.space
[slides and audio] Computational Limits of Low-Rank Adaptation (LoRA) for Transformer-Based Models