An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models

An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models

7 Jun 2024 | Xiongtao Zhou, Jie He, Yuhua Ke, Guangyao Zhu, Victor Gutiérrez-Basulto, Jeff Z. Pan
This paper presents an empirical study on parameter-efficient fine-tuning (PEFT) methods for multimodal large language models (MLLMs). The study evaluates four popular PEFT methods—LoRA, IA3, Adapter, and Prefix-Tuning—on seven datasets, including both unseen and seen datasets. The goal is to identify effective PEFT methods that can enhance the performance of MLLMs when only a limited number of parameters are trained. The study focuses on various aspects, including the impact of PEFT methods on different models, the location of the PEFT module, the size of fine-tuning data, model stability, generalization, and hallucination. The results show that the Adapter method performs the best in terms of model generalization, stability, and hallucination. Fine-tuning the connector layers leads to improved performance in most MLLMs. The study also investigates the impact of the PEFT module's location on performance, finding that the best location varies depending on the model. Additionally, the study finds that larger datasets lead to better performance, but when resources are limited, medium-sized datasets are more efficient. The study also examines the relationship between the number of trainable parameters and model stability, finding that Adapter and LoRA exhibit different levels of stability on unseen and seen datasets. The study concludes that Adapter is the most effective PEFT method for MLLMs, as it achieves the best performance in terms of accuracy, stability, generalization, and hallucination reduction. The study also finds that fine-tuning the connector layers does not always yield better results. The study provides a comprehensive analysis of the performance of different PEFT methods on various datasets and models, offering insights into the best practices for parameter-efficient fine-tuning of MLLMs.This paper presents an empirical study on parameter-efficient fine-tuning (PEFT) methods for multimodal large language models (MLLMs). The study evaluates four popular PEFT methods—LoRA, IA3, Adapter, and Prefix-Tuning—on seven datasets, including both unseen and seen datasets. The goal is to identify effective PEFT methods that can enhance the performance of MLLMs when only a limited number of parameters are trained. The study focuses on various aspects, including the impact of PEFT methods on different models, the location of the PEFT module, the size of fine-tuning data, model stability, generalization, and hallucination. The results show that the Adapter method performs the best in terms of model generalization, stability, and hallucination. Fine-tuning the connector layers leads to improved performance in most MLLMs. The study also investigates the impact of the PEFT module's location on performance, finding that the best location varies depending on the model. Additionally, the study finds that larger datasets lead to better performance, but when resources are limited, medium-sized datasets are more efficient. The study also examines the relationship between the number of trainable parameters and model stability, finding that Adapter and LoRA exhibit different levels of stability on unseen and seen datasets. The study concludes that Adapter is the most effective PEFT method for MLLMs, as it achieves the best performance in terms of accuracy, stability, generalization, and hallucination reduction. The study also finds that fine-tuning the connector layers does not always yield better results. The study provides a comprehensive analysis of the performance of different PEFT methods on various datasets and models, offering insights into the best practices for parameter-efficient fine-tuning of MLLMs.
Reach us at info@study.space