Scalable MatMul-free Language Modeling

Scalable MatMul-free Language Modeling

18 Jun 2024 | Rui-Jie Zhu, Yu Zhang, Ethan Sifferman, Tyler Sheaves, Yiqiao Wang, Dustin Richmond, Peng Zhou, Jason K. Eshraghian
This paper presents a scalable MatMul-free language model (MatMul-free LM) that eliminates matrix multiplication (MatMul) operations while maintaining strong performance at billion-parameter scales. The authors demonstrate that their proposed models achieve performance comparable to state-of-the-art Transformers with significantly reduced memory usage during inference. They investigate the scaling laws and find that the performance gap between MatMul-free models and full-precision Transformers narrows as the model size increases. The paper also includes an optimized GPU implementation that reduces memory usage by up to 61% over an unoptimized baseline during training and a custom FPGA accelerator that reduces memory consumption by more than 10× compared to unoptimized models. The work highlights the potential of lightweight models in reducing computational demands and energy use in real-world applications. The code implementation is available at <https://github.com/ridgerchu/matmulfreellm>.This paper presents a scalable MatMul-free language model (MatMul-free LM) that eliminates matrix multiplication (MatMul) operations while maintaining strong performance at billion-parameter scales. The authors demonstrate that their proposed models achieve performance comparable to state-of-the-art Transformers with significantly reduced memory usage during inference. They investigate the scaling laws and find that the performance gap between MatMul-free models and full-precision Transformers narrows as the model size increases. The paper also includes an optimized GPU implementation that reduces memory usage by up to 61% over an unoptimized baseline during training and a custom FPGA accelerator that reduces memory consumption by more than 10× compared to unoptimized models. The work highlights the potential of lightweight models in reducing computational demands and energy use in real-world applications. The code implementation is available at <https://github.com/ridgerchu/matmulfreellm>.
Reach us at info@study.space
Understanding Scalable MatMul-free Language Modeling