[slides and audio] Compressible Dynamics in Deep Overparameterized Low-Rank Learning %26 Adaptation

This paper explores the benefits of overparameterization in machine learning models, particularly in the context of deep low-rank matrix completion and language model fine-tuning. The authors demonstrate that by leveraging the inherent low-dimensional structures of data and compressible dynamics within model parameters, significant computational savings can be achieved without sacrificing the advantages of overparameterization. They show that the learning dynamics of each weight matrix are confined to an invariant low-dimensional subspace, allowing for the construction and training of highly compressed factorizations. This approach improves training efficiency in deep matrix completion while retaining the benefits of overparameterization. For language model fine-tuning, they propose "Deep LoRA," which improves upon the existing low-rank adaptation (LoRA) technique by reducing overfitting and simplifying hyperparameter setup. The effectiveness of Deep LoRA is validated on natural language tasks, especially with limited data. The paper provides theoretical foundations and empirical evidence to support these findings, offering a practical and efficient solution for deep learning applications.This paper explores the benefits of overparameterization in machine learning models, particularly in the context of deep low-rank matrix completion and language model fine-tuning. The authors demonstrate that by leveraging the inherent low-dimensional structures of data and compressible dynamics within model parameters, significant computational savings can be achieved without sacrificing the advantages of overparameterization. They show that the learning dynamics of each weight matrix are confined to an invariant low-dimensional subspace, allowing for the construction and training of highly compressed factorizations. This approach improves training efficiency in deep matrix completion while retaining the benefits of overparameterization. For language model fine-tuning, they propose "Deep LoRA," which improves upon the existing low-rank adaptation (LoRA) technique by reducing overfitting and simplifying hyperparameter setup. The effectiveness of Deep LoRA is validated on natural language tasks, especially with limited data. The paper provides theoretical foundations and empirical evidence to support these findings, offering a practical and efficient solution for deep learning applications.

Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation

June 11, 2024 | Can Yaras, Peng Wang, Laura Balzano, and Qing Qu