FusionBench: A Comprehensive Benchmark of Deep Model Fusion

FusionBench: A Comprehensive Benchmark of Deep Model Fusion

14 Jun 2024 | Anke Tang, Li Shen, Yong Luo, Han Hu, Bo Du, Dacheng Tao
FusionBench is a comprehensive benchmark for evaluating deep model fusion techniques. It covers a wide range of tasks, including open-vocabulary image classification, text classification, and text-to-text generation. Each task includes up to eight sub-tasks with corresponding models, featuring both full fine-tuning and LoRA fine-tuning, as well as models of different sizes. The benchmark includes 26 distinct tasks, 74 fine-tuned models, and 16 fusion techniques. It provides a modular and extensible platform with three core modules: Algorithm Module, Model Pool Module, and Task Pool Module. The benchmark is designed to be user-friendly, with detailed documentation, code examples, and tutorials to aid researchers in understanding and replicating the results. FusionBench aims to provide a fair and balanced comparison of various multi-task model fusion techniques across different tasks, model scales, and fine-tuning strategies. The benchmark includes a variety of model fusion techniques, such as model ensemble, model merging, and model mixing. The evaluation of these techniques is conducted on a variety of tasks, including image classification, scene understanding, and text classification. The results show that multi-task model fusion algorithms generally outperform pre-trained models, with some methods achieving the best overall performance. The benchmark also evaluates the generalization and robustness of these algorithms, showing that they can adapt to new tasks and handle corrupted test sets. The benchmark is expected to accelerate the development of deep model fusion algorithms and contribute to environmental sustainability by reducing the carbon footprint associated with training these models.FusionBench is a comprehensive benchmark for evaluating deep model fusion techniques. It covers a wide range of tasks, including open-vocabulary image classification, text classification, and text-to-text generation. Each task includes up to eight sub-tasks with corresponding models, featuring both full fine-tuning and LoRA fine-tuning, as well as models of different sizes. The benchmark includes 26 distinct tasks, 74 fine-tuned models, and 16 fusion techniques. It provides a modular and extensible platform with three core modules: Algorithm Module, Model Pool Module, and Task Pool Module. The benchmark is designed to be user-friendly, with detailed documentation, code examples, and tutorials to aid researchers in understanding and replicating the results. FusionBench aims to provide a fair and balanced comparison of various multi-task model fusion techniques across different tasks, model scales, and fine-tuning strategies. The benchmark includes a variety of model fusion techniques, such as model ensemble, model merging, and model mixing. The evaluation of these techniques is conducted on a variety of tasks, including image classification, scene understanding, and text classification. The results show that multi-task model fusion algorithms generally outperform pre-trained models, with some methods achieving the best overall performance. The benchmark also evaluates the generalization and robustness of these algorithms, showing that they can adapt to new tasks and handle corrupted test sets. The benchmark is expected to accelerate the development of deep model fusion algorithms and contribute to environmental sustainability by reducing the carbon footprint associated with training these models.
Reach us at info@study.space
[slides] FusionBench%3A A Comprehensive Benchmark of Deep Model Fusion | StudySpace