GaLore is a memory-efficient training strategy for large language models (LLMs) that allows full-parameter learning while significantly reducing memory usage compared to common low-rank adaptation methods like LoRA. The key idea of GaLore is to leverage the low-rank structure of gradients during training, rather than approximating the weight matrix itself as low-rank. This approach reduces the memory required for optimizer states by up to 65.5% in pre-training and 82.5% in fine-tuning, while maintaining efficiency and performance. GaLore is compatible with various optimizers, including Adam, 8-bit Adam, and Adafactor, and can be applied to both pre-training and fine-tuning tasks. It enables the training of large models on consumer-grade GPUs with limited memory, such as the NVIDIA RTX 4090, without the need for model parallelism, checkpointing, or offloading strategies. GaLore also achieves comparable or better performance than existing low-rank methods on tasks like GLUE benchmarking. The method is memory-efficient and can be used for both pre-training and fine-tuning of LLMs, making it a valuable tool for training large-scale models on limited hardware.GaLore is a memory-efficient training strategy for large language models (LLMs) that allows full-parameter learning while significantly reducing memory usage compared to common low-rank adaptation methods like LoRA. The key idea of GaLore is to leverage the low-rank structure of gradients during training, rather than approximating the weight matrix itself as low-rank. This approach reduces the memory required for optimizer states by up to 65.5% in pre-training and 82.5% in fine-tuning, while maintaining efficiency and performance. GaLore is compatible with various optimizers, including Adam, 8-bit Adam, and Adafactor, and can be applied to both pre-training and fine-tuning tasks. It enables the training of large models on consumer-grade GPUs with limited memory, such as the NVIDIA RTX 4090, without the need for model parallelism, checkpointing, or offloading strategies. GaLore also achieves comparable or better performance than existing low-rank methods on tasks like GLUE benchmarking. The method is memory-efficient and can be used for both pre-training and fine-tuning of LLMs, making it a valuable tool for training large-scale models on limited hardware.