[slides and audio] FusionBench%3A A Comprehensive Benchmark of Deep Model Fusion

**FusionBench: A Comprehensive Benchmark of Deep Model Fusion** Deep model fusion is an emerging technique that combines the predictions or parameters of multiple deep neural networks into a single model, leveraging the strengths of individual models to potentially enhance overall performance. However, evaluations of these techniques often lack consistency and robustness against distribution shifts. To address this, FusionBench has been introduced as the first comprehensive benchmark dedicated to deep model fusion. **Key Features:** - **Comprehensive Benchmark:** covers a wide range of tasks (image classification, text classification, text-to-text generation) and includes up to eight tasks per category. - **Diverse Models:** features both full fine-tuning and LoRA fine-tuning, as well as models of different sizes. - **16 Fusion Techniques:** implemented and evaluated, including model ensemble, model merging, and model mixing methods. - **User-Friendly Platform:** provides detailed documentation, code examples, and tutorials to facilitate research and replication. **Contributions:** 1. **Extensive Codebase and Configurable Interface:** A flexible and modular codebase with a command-line interface for easy customization. 2. **Comprehensive Evaluations:** Thorough assessments of fusion techniques across various benchmarks and settings. 3. **Extensive Analysis and Insights:** Detailed analysis of key factors influencing effectiveness, including best practices and recommendations for future research. 4. **User-Friendly Resources:** Comprehensive documentation and tutorials to aid researchers in understanding and using the benchmark. **Related Work:** - **Taxonomy of Fusion Techniques:** Divided into Model Ensemble, Model Merging, and Model Mixing, each with unique advantages and applicability. - **FusionBench vs. MergeKit:** While MergeKit focuses on large language models, FusionBench is more generalized and versatile, covering a broader range of tasks and models. **Benchmark Details:** - **Algorithm Module:** Implements various fusion algorithms. - **Model Pool Module:** Manages pre-trained and fine-tuned models. - **Task Pool Module:** Manages tasks and evaluation metrics. **Evaluation and Analysis:** - **Multi-Task Model Fusion:** Compares performance across image classification, scene understanding, text classification, and text-to-text generation tasks. - **Generalization and Robustness:** Evaluates the ability of fused models to adapt to new tasks and handle corrupted test sets. **Conclusions and Future Plans:** - **Conclusions:** FusionBench provides a flexible and modular framework for evaluating deep model fusion algorithms, with extensive documentation and tutorials. - **Future Plans:** Intends to extend the benchmark to include more datasets and applications, such as human preference alignment, multi-modal fusion, and reinforcement learning tasks. - **Societal Impacts:** Potential benefits include accelerating model development and reducing carbon footprint, with minimal negative societal impacts.**FusionBench: A Comprehensive Benchmark of Deep Model Fusion** Deep model fusion is an emerging technique that combines the predictions or parameters of multiple deep neural networks into a single model, leveraging the strengths of individual models to potentially enhance overall performance. However, evaluations of these techniques often lack consistency and robustness against distribution shifts. To address this, FusionBench has been introduced as the first comprehensive benchmark dedicated to deep model fusion. **Key Features:** - **Comprehensive Benchmark:** covers a wide range of tasks (image classification, text classification, text-to-text generation) and includes up to eight tasks per category. - **Diverse Models:** features both full fine-tuning and LoRA fine-tuning, as well as models of different sizes. - **16 Fusion Techniques:** implemented and evaluated, including model ensemble, model merging, and model mixing methods. - **User-Friendly Platform:** provides detailed documentation, code examples, and tutorials to facilitate research and replication. **Contributions:** 1. **Extensive Codebase and Configurable Interface:** A flexible and modular codebase with a command-line interface for easy customization. 2. **Comprehensive Evaluations:** Thorough assessments of fusion techniques across various benchmarks and settings. 3. **Extensive Analysis and Insights:** Detailed analysis of key factors influencing effectiveness, including best practices and recommendations for future research. 4. **User-Friendly Resources:** Comprehensive documentation and tutorials to aid researchers in understanding and using the benchmark. **Related Work:** - **Taxonomy of Fusion Techniques:** Divided into Model Ensemble, Model Merging, and Model Mixing, each with unique advantages and applicability. - **FusionBench vs. MergeKit:** While MergeKit focuses on large language models, FusionBench is more generalized and versatile, covering a broader range of tasks and models. **Benchmark Details:** - **Algorithm Module:** Implements various fusion algorithms. - **Model Pool Module:** Manages pre-trained and fine-tuned models. - **Task Pool Module:** Manages tasks and evaluation metrics. **Evaluation and Analysis:** - **Multi-Task Model Fusion:** Compares performance across image classification, scene understanding, text classification, and text-to-text generation tasks. - **Generalization and Robustness:** Evaluates the ability of fused models to adapt to new tasks and handle corrupted test sets. **Conclusions and Future Plans:** - **Conclusions:** FusionBench provides a flexible and modular framework for evaluating deep model fusion algorithms, with extensive documentation and tutorials. - **Future Plans:** Intends to extend the benchmark to include more datasets and applications, such as human preference alignment, multi-modal fusion, and reinforcement learning tasks. - **Societal Impacts:** Potential benefits include accelerating model development and reducing carbon footprint, with minimal negative societal impacts.

FusionBench: A Comprehensive Benchmark of Deep Model Fusion

14 Jun 2024 | Anke Tang, Li Shen, Yong Luo, Han Hu, Bo Du, Dacheng Tao