7 Jun 2024 | Xiongtao Zhou, Jie He, Yuhua Ke, Guangyao Zhu, Víctor Gutiérrez-Basulto, Jeff Z. Pan
This paper investigates parameter-efficient fine-tuning (PEFT) methods for multimodal large language models (MLLMs) to address the challenge of fine-tuning models with billions of parameters. The authors evaluate four popular PEFT methods—LoRA, Adapter, IA3, and Prefix-Tuning—on seven datasets from two categories: unseen and seen datasets. The study aims to identify effective methods for enhancing MLLM performance with limited parameters. Key findings include:
1. **Performance**: Adapter outperforms other PEFT methods in all aspects, followed by LoRA.
2. **Connector Layers**: Fine-tuning connector layers often improves performance, especially on unseen datasets.
3. **Data Scale**: Larger datasets generally lead to better performance, but medium-sized datasets are more efficient when resources are limited.
4. **Stability**: Adapter shows better stability on both unseen and seen datasets.
5. **Generalization**: Adapter exhibits the strongest robustness and generalization performance.
6. **Hallucination**: Adapter reduces hallucinations more effectively compared to other methods.
The study provides a comprehensive analysis of PEFT methods for MLLMs, highlighting the importance of fine-tuning connector layers and the benefits of Adapter in terms of performance, stability, and hallucination reduction.This paper investigates parameter-efficient fine-tuning (PEFT) methods for multimodal large language models (MLLMs) to address the challenge of fine-tuning models with billions of parameters. The authors evaluate four popular PEFT methods—LoRA, Adapter, IA3, and Prefix-Tuning—on seven datasets from two categories: unseen and seen datasets. The study aims to identify effective methods for enhancing MLLM performance with limited parameters. Key findings include:
1. **Performance**: Adapter outperforms other PEFT methods in all aspects, followed by LoRA.
2. **Connector Layers**: Fine-tuning connector layers often improves performance, especially on unseen datasets.
3. **Data Scale**: Larger datasets generally lead to better performance, but medium-sized datasets are more efficient when resources are limited.
4. **Stability**: Adapter shows better stability on both unseen and seen datasets.
5. **Generalization**: Adapter exhibits the strongest robustness and generalization performance.
6. **Hallucination**: Adapter reduces hallucinations more effectively compared to other methods.
The study provides a comprehensive analysis of PEFT methods for MLLMs, highlighting the importance of fine-tuning connector layers and the benefits of Adapter in terms of performance, stability, and hallucination reduction.