[slides and audio] Everybody Prune Now%3A Structured Pruning of LLMs with only Forward Passes

The paper "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes" addresses the challenge of making large language models (LLMs) more accessible to practitioners with limited hardware resources. The authors propose Bonsai, a gradient-free, perturbative pruning method that can produce small, fast, and accurate pruned models using only forward passes through the original model. Bonsai aims to empower practitioners to prune models to a size that their hardware can handle for inference, addressing the gap between the resources available to lay practitioners and those endowed institutions. Key contributions of Bonsai include: 1. **Memory-Friendly Pruning**: Unlike gradient-based pruning methods, Bonsai does not require significant memory overhead during training, making it accessible to practitioners with limited resources. 2. **Efficient Pruning Decisions**: Bonsai estimates module importances by generating sub-models and evaluating their performance, using an under-determined regression problem to estimate the relevance of modules. 3. **Global Pruning**: Unlike layer-by-layer pruning, Bonsai takes a holistic view of the model, ensuring that modules across layers are removed and evaluated together to preserve accuracy. Experiments demonstrate that Bonsai: - Achieves comparable performance to semi-structured pruning methods like Wanda but with faster inference. - Outperforms gradient-based structured pruning methods like LLM-Pruner and LoRAPrune on multiple evaluation settings. - Can produce a sub-2B model that outperforms the best sub-2B parameter model on the Huggingface Open LLM leaderboard on 4 out of 6 tasks. The paper also discusses the limitations and future work, including the need for adaptive sampling and dynamic fine-tuning during pruning. Overall, Bonsai represents a significant advancement in making LLMs more accessible and efficient for a broader range of users.The paper "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes" addresses the challenge of making large language models (LLMs) more accessible to practitioners with limited hardware resources. The authors propose Bonsai, a gradient-free, perturbative pruning method that can produce small, fast, and accurate pruned models using only forward passes through the original model. Bonsai aims to empower practitioners to prune models to a size that their hardware can handle for inference, addressing the gap between the resources available to lay practitioners and those endowed institutions. Key contributions of Bonsai include: 1. **Memory-Friendly Pruning**: Unlike gradient-based pruning methods, Bonsai does not require significant memory overhead during training, making it accessible to practitioners with limited resources. 2. **Efficient Pruning Decisions**: Bonsai estimates module importances by generating sub-models and evaluating their performance, using an under-determined regression problem to estimate the relevance of modules. 3. **Global Pruning**: Unlike layer-by-layer pruning, Bonsai takes a holistic view of the model, ensuring that modules across layers are removed and evaluated together to preserve accuracy. Experiments demonstrate that Bonsai: - Achieves comparable performance to semi-structured pruning methods like Wanda but with faster inference. - Outperforms gradient-based structured pruning methods like LLM-Pruner and LoRAPrune on multiple evaluation settings. - Can produce a sub-2B model that outperforms the best sub-2B parameter model on the Huggingface Open LLM leaderboard on 4 out of 6 tasks. The paper also discusses the limitations and future work, including the need for adaptive sampling and dynamic fine-tuning during pruning. Overall, Bonsai represents a significant advancement in making LLMs more accessible and efficient for a broader range of users.

Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes

9 Feb 2024 | Lucio Dery, Steven Kolawole, Jean-François Kagy, Virginia Smith, Graham Neubig, Ameet Talwalkar