8 Feb 2024 | Yi Xin, Siqi Luo, Haodi Zhou, Junlong Du, Xiaohong Liu, Yue Fan, Qing Li, Yuntao Du
This paper provides a comprehensive survey of parameter-efficient fine-tuning (PEFT) methods for pre-trained vision models (PVMs). As PVMs grow in size, full fine-tuning becomes computationally and memory-intensive, prompting the need for PEFT methods that minimize parameter modifications while maintaining performance. The survey categorizes PEFT methods into addition-based, partial-based, and unified-based approaches. Addition-based methods introduce small modules or parameters to adapt PVMs, such as adapter tuning, prompt tuning, prefix tuning, and side tuning. Partial-based methods focus on updating only a subset of parameters, including specification tuning and reparameter tuning. Unified-based methods integrate various PEFT techniques into a single framework, such as NOAH and V-PEFT.
The survey discusses the characteristics, parameter counts, and applications of different PEFT methods. It highlights that while PEFT methods offer significant advantages in terms of efficiency, they also have limitations in interpretability and performance. The paper also identifies key challenges in the field, including the explainability of visual PEFT methods, their application to generative and multimodal models, and the need for a comprehensive PEFT library for the vision domain.
The survey concludes that PEFT is a promising approach for adapting PVMs to various downstream tasks with minimal computational resources. However, further research is needed to improve the interpretability of PEFT methods, enhance their applicability to generative and multimodal models, and develop a unified PEFT library for the vision domain. The paper serves as a valuable resource for researchers interested in parameter-efficient fine-tuning, offering insights into the current state of the field and potential directions for future research.This paper provides a comprehensive survey of parameter-efficient fine-tuning (PEFT) methods for pre-trained vision models (PVMs). As PVMs grow in size, full fine-tuning becomes computationally and memory-intensive, prompting the need for PEFT methods that minimize parameter modifications while maintaining performance. The survey categorizes PEFT methods into addition-based, partial-based, and unified-based approaches. Addition-based methods introduce small modules or parameters to adapt PVMs, such as adapter tuning, prompt tuning, prefix tuning, and side tuning. Partial-based methods focus on updating only a subset of parameters, including specification tuning and reparameter tuning. Unified-based methods integrate various PEFT techniques into a single framework, such as NOAH and V-PEFT.
The survey discusses the characteristics, parameter counts, and applications of different PEFT methods. It highlights that while PEFT methods offer significant advantages in terms of efficiency, they also have limitations in interpretability and performance. The paper also identifies key challenges in the field, including the explainability of visual PEFT methods, their application to generative and multimodal models, and the need for a comprehensive PEFT library for the vision domain.
The survey concludes that PEFT is a promising approach for adapting PVMs to various downstream tasks with minimal computational resources. However, further research is needed to improve the interpretability of PEFT methods, enhance their applicability to generative and multimodal models, and develop a unified PEFT library for the vision domain. The paper serves as a valuable resource for researchers interested in parameter-efficient fine-tuning, offering insights into the current state of the field and potential directions for future research.