Understanding An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models

This paper investigates parameter-efficient fine-tuning (PEFT) methods for multimodal large language models (MLLMs) to address the challenge of fine-tuning models with billions of parameters. The authors evaluate four popular PEFT methods—LoRA, Adapter, IA3, and Prefix-Tuning—on seven datasets from two categories: unseen and seen datasets. The study aims to identify effective methods for enhancing MLLM performance with limited parameters. Key findings include: 1. **Performance**: Adapter outperforms other PEFT methods in all aspects, followed by LoRA. 2. **Connector Layers**: Fine-tuning connector layers often improves performance, especially on unseen datasets. 3. **Data Scale**: Larger datasets generally lead to better performance, but medium-sized datasets are more efficient when resources are limited. 4. **Stability**: Adapter shows better stability on both unseen and seen datasets. 5. **Generalization**: Adapter exhibits the strongest robustness and generalization performance. 6. **Hallucination**: Adapter reduces hallucinations more effectively compared to other methods. The study provides a comprehensive analysis of PEFT methods for MLLMs, highlighting the importance of fine-tuning connector layers and the benefits of Adapter in terms of performance, stability, and hallucination reduction.This paper investigates parameter-efficient fine-tuning (PEFT) methods for multimodal large language models (MLLMs) to address the challenge of fine-tuning models with billions of parameters. The authors evaluate four popular PEFT methods—LoRA, Adapter, IA3, and Prefix-Tuning—on seven datasets from two categories: unseen and seen datasets. The study aims to identify effective methods for enhancing MLLM performance with limited parameters. Key findings include: 1. **Performance**: Adapter outperforms other PEFT methods in all aspects, followed by LoRA. 2. **Connector Layers**: Fine-tuning connector layers often improves performance, especially on unseen datasets. 3. **Data Scale**: Larger datasets generally lead to better performance, but medium-sized datasets are more efficient when resources are limited. 4. **Stability**: Adapter shows better stability on both unseen and seen datasets. 5. **Generalization**: Adapter exhibits the strongest robustness and generalization performance. 6. **Hallucination**: Adapter reduces hallucinations more effectively compared to other methods. The study provides a comprehensive analysis of PEFT methods for MLLMs, highlighting the importance of fine-tuning connector layers and the benefits of Adapter in terms of performance, stability, and hallucination reduction.

An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models

7 Jun 2024 | Xiongtao Zhou, Jie He, Yuhua Ke, Guangyao Zhu, Víctor Gutiérrez-Basulto, Jeff Z. Pan