This paper introduces Low-Rank Side Adaptation (LoSA), a parameter-efficient, training-time and memory-efficient method for adapting large, pre-trained vision models. Unlike existing methods that require backpropagating gradients through the backbone, LoSA keeps the backbone frozen and instead learns a parallel network that refines the backbone's features. This approach achieves state-of-the-art accuracy-parameter trade-offs on the VTAB benchmark, and outperforms prior methods in terms of training-time and memory usage. LoSA is also shown to scale efficiently to large models, such as a 4-billion-parameter ViT-e backbone, for video classification tasks without requiring complex model parallelism. The method uses a low-rank mixer architecture that alternates between channel and token dimensions to achieve high accuracy while maintaining efficiency. Experiments on image and video classification tasks demonstrate that LoSA outperforms existing parameter-efficient adaptation methods in terms of accuracy, training speed, and memory usage. The paper also highlights the importance of evaluating efficiency metrics beyond just the number of learned parameters, and provides a comprehensive analysis of the effectiveness of different adaptation strategies.This paper introduces Low-Rank Side Adaptation (LoSA), a parameter-efficient, training-time and memory-efficient method for adapting large, pre-trained vision models. Unlike existing methods that require backpropagating gradients through the backbone, LoSA keeps the backbone frozen and instead learns a parallel network that refines the backbone's features. This approach achieves state-of-the-art accuracy-parameter trade-offs on the VTAB benchmark, and outperforms prior methods in terms of training-time and memory usage. LoSA is also shown to scale efficiently to large models, such as a 4-billion-parameter ViT-e backbone, for video classification tasks without requiring complex model parallelism. The method uses a low-rank mixer architecture that alternates between channel and token dimensions to achieve high accuracy while maintaining efficiency. Experiments on image and video classification tasks demonstrate that LoSA outperforms existing parameter-efficient adaptation methods in terms of accuracy, training speed, and memory usage. The paper also highlights the importance of evaluating efficiency metrics beyond just the number of learned parameters, and provides a comprehensive analysis of the effectiveness of different adaptation strategies.