FACING THE ELEPHANT IN THE ROOM: VISUAL PROMPT TUNING OR FULL FINETUNING?

FACING THE ELEPHANT IN THE ROOM: VISUAL PROMPT TUNING OR FULL FINETUNING?

2024 | Cheng Han, Qifan Wang, Yiming Cui, Wenguan Wang, Lifu Huang, Siyuan Qi, Dongfang Liu
This paper investigates the conditions under which Visual Prompt Tuning (VPT) outperforms full finetuning (FT) in transfer learning for vision tasks. The study analyzes 19 datasets and tasks, finding that VPT is preferable in three out of four transfer learning scenarios. VPT is particularly effective when there is a significant disparity between the original and downstream task objectives (e.g., transitioning from classification to counting) or when the data distributions between the two tasks are similar (e.g., both involve natural images). The study also explores the underlying reasons for VPT's success, finding that overfitting is not the sole factor. Instead, VPT's unique approach of preserving original features and adding parameters plays a crucial role. The results show that VPT outperforms FT in 16 out of 19 cases, with FT becoming more favorable as the downstream dataset size increases. Additionally, the study demonstrates that VPT can achieve competitive performance with fewer parameters, making it suitable for both low- and high-resource scenarios. The paper also provides visualizations showing that VPT enhances feature learning, particularly in cases where full finetuning fails. Overall, the study provides insights into the effectiveness of VPT and offers guidance for its optimal use in transfer learning.This paper investigates the conditions under which Visual Prompt Tuning (VPT) outperforms full finetuning (FT) in transfer learning for vision tasks. The study analyzes 19 datasets and tasks, finding that VPT is preferable in three out of four transfer learning scenarios. VPT is particularly effective when there is a significant disparity between the original and downstream task objectives (e.g., transitioning from classification to counting) or when the data distributions between the two tasks are similar (e.g., both involve natural images). The study also explores the underlying reasons for VPT's success, finding that overfitting is not the sole factor. Instead, VPT's unique approach of preserving original features and adding parameters plays a crucial role. The results show that VPT outperforms FT in 16 out of 19 cases, with FT becoming more favorable as the downstream dataset size increases. Additionally, the study demonstrates that VPT can achieve competitive performance with fewer parameters, making it suitable for both low- and high-resource scenarios. The paper also provides visualizations showing that VPT enhances feature learning, particularly in cases where full finetuning fails. Overall, the study provides insights into the effectiveness of VPT and offers guidance for its optimal use in transfer learning.
Reach us at info@study.space