Compute Better Spent: Replacing Dense Layers with Structured Matrices

Compute Better Spent: Replacing Dense Layers with Structured Matrices

2024 | Shikai Qiu, Andres Potapczynski, Marc Finzi, Micah Goldblum, Andrew Gordon Wilson
This paper explores the use of structured matrices as alternatives to dense matrices in neural networks, aiming to improve computational efficiency. The authors demonstrate that structured matrices can outperform dense matrices in terms of compute efficiency, especially when properly initialized and trained with structure-aware learning rates. They introduce the Block Tensor-Train (BTT) matrix family, which outperforms dense matrices on multiple tasks, including image and language modeling. BTT achieves exponentially lower training loss than dense matrices on CIFAR-10/100 with augmentation and matches the performance of dense ViT-S/32 on ImageNet-1k with 3.8 times less compute. The study also shows that structured matrices can have better scaling laws than dense matrices, with performance improving more rapidly as compute increases. The authors emphasize the importance of structure-aware learning rates and initialization scales for achieving optimal performance. They also highlight the trade-off between compute efficiency and memory efficiency, showing that structured matrices can be more memory-efficient while maintaining high performance. The paper concludes that structured matrices offer a promising direction for improving the efficiency of neural networks, particularly in foundation models.This paper explores the use of structured matrices as alternatives to dense matrices in neural networks, aiming to improve computational efficiency. The authors demonstrate that structured matrices can outperform dense matrices in terms of compute efficiency, especially when properly initialized and trained with structure-aware learning rates. They introduce the Block Tensor-Train (BTT) matrix family, which outperforms dense matrices on multiple tasks, including image and language modeling. BTT achieves exponentially lower training loss than dense matrices on CIFAR-10/100 with augmentation and matches the performance of dense ViT-S/32 on ImageNet-1k with 3.8 times less compute. The study also shows that structured matrices can have better scaling laws than dense matrices, with performance improving more rapidly as compute increases. The authors emphasize the importance of structure-aware learning rates and initialization scales for achieving optimal performance. They also highlight the trade-off between compute efficiency and memory efficiency, showing that structured matrices can be more memory-efficient while maintaining high performance. The paper concludes that structured matrices offer a promising direction for improving the efficiency of neural networks, particularly in foundation models.
Reach us at info@study.space