Understanding Facing the Elephant in the Room%3A Visual Prompt Tuning or Full Finetuning%3F

The paper "Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning?" by Cheng Han et al. explores the effectiveness of Visual Prompt Tuning (VPT) compared to full finetuning in various scenarios. VPT, a parameter-efficient transfer learning technique, has gained attention due to its superior performance over traditional full finetuning. The authors conduct a comprehensive analysis across 19 distinct datasets and tasks to understand when and why VPT is more advantageous. Key findings include: 1. **When VPT is Preferred**: VPT is preferable when there is a substantial disparity between the original and downstream task objectives (e.g., transitioning from classification to counting) or when the data distributions between the tasks are similar (e.g., both involve natural images). 2. **Why VPT Works**: Overfitting alone does not explain VPT's success. The unique way VPT preserves original features and adds parameters plays a crucial role. Additional parameters help the model escape local minima and enhance feature learning. 3. **Performance Trends**: As the downstream dataset size increases, the performance gap between VPT and full finetuning narrows, with full finetuning eventually outperforming VPT in high-resource settings. The study provides insights into the mechanisms behind VPT and offers guidance for its optimal utilization. It suggests that researchers should consider task disparities and dataset scale when choosing between VPT and full finetuning.The paper "Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning?" by Cheng Han et al. explores the effectiveness of Visual Prompt Tuning (VPT) compared to full finetuning in various scenarios. VPT, a parameter-efficient transfer learning technique, has gained attention due to its superior performance over traditional full finetuning. The authors conduct a comprehensive analysis across 19 distinct datasets and tasks to understand when and why VPT is more advantageous. Key findings include: 1. **When VPT is Preferred**: VPT is preferable when there is a substantial disparity between the original and downstream task objectives (e.g., transitioning from classification to counting) or when the data distributions between the tasks are similar (e.g., both involve natural images). 2. **Why VPT Works**: Overfitting alone does not explain VPT's success. The unique way VPT preserves original features and adds parameters plays a crucial role. Additional parameters help the model escape local minima and enhance feature learning. 3. **Performance Trends**: As the downstream dataset size increases, the performance gap between VPT and full finetuning narrows, with full finetuning eventually outperforming VPT in high-resource settings. The study provides insights into the mechanisms behind VPT and offers guidance for its optimal utilization. It suggests that researchers should consider task disparities and dataset scale when choosing between VPT and full finetuning.

Facing the Elephant in the Room: VISUAL PROMPT TUNING OR FULL FINETUNING?

23 Jan 2024 | Cheng Han1, Qifan Wang2, Yiming Cui3, Wenguan Wang4, Lifu Huang5, Siyuan Qi6, Dongfang Liu1*