13 May 2024 | Andrei Tomut, Saeed S. Jahromi, Abhijoy Sarkar, Uygar Kurt, Sukhbirinder Singh, Faysal Ishtiaq, César Muñoz, Pradeep Singh Bajaj, Ali Elborady, Gianni del Bimbo, Mehrzad Alizadeh, David Montero, Pablo Martín-Ramiro, Muhammad Ibrahim, Oussama Tahiri Aloui, John Malcolm, Samuel Mugel, Román Orús
The paper introduces *CompactifAI*, an innovative compression method for Large Language Models (LLMs) using quantum-inspired Tensor Networks. This approach focuses on reducing the model's correlation space, allowing for more controlled and interpretable compression. The method is versatile and can be combined with other compression techniques. The benchmark results show that *CompactifAI* combined with quantization reduces the memory size of LlaMA-2 7B by 93%, reduces the number of parameters by 70%, accelerates training by 50%, and inference by 25%, with only a 2-3% accuracy drop. The study also reveals that deeper layers are more suitable for tensor network compression, supporting recent observations on the ineffectiveness of deeper layers in LLM performance. The results suggest that standard LLMs are heavily overparametrized and do not need to be as large as they currently are.The paper introduces *CompactifAI*, an innovative compression method for Large Language Models (LLMs) using quantum-inspired Tensor Networks. This approach focuses on reducing the model's correlation space, allowing for more controlled and interpretable compression. The method is versatile and can be combined with other compression techniques. The benchmark results show that *CompactifAI* combined with quantization reduces the memory size of LlaMA-2 7B by 93%, reduces the number of parameters by 70%, accelerates training by 50%, and inference by 25%, with only a 2-3% accuracy drop. The study also reveals that deeper layers are more suitable for tensor network compression, supporting recent observations on the ineffectiveness of deeper layers in LLM performance. The results suggest that standard LLMs are heavily overparametrized and do not need to be as large as they currently are.