Revisiting the Power of Prompt for Visual Tuning

Revisiting the Power of Prompt for Visual Tuning

27 May 2024 | Yuzhu Wang, Lechao Cheng, Chaowei Fang, Dingwen Zhang, Manni Duan, Meng Wang
The paper "Revisiting the Power of Prompt for Visual Tuning" by Yuzhu Wang et al. addresses the challenges of visual prompt tuning (VPT) and its variants, particularly in terms of prompt initialization, prompt length, and performance under self-supervised pretraining. The authors propose a new approach called Self-Prompt Tuning (SPT) that initializes prompts with downstream token prototypes, which significantly improves performance. SPT is designed to overcome the limitations of existing methods, such as VPT and GateVPT, by leveraging high mutual information between prompt tokens and patch tokens during training. The proposed method is evaluated on various benchmarks and shows substantial improvements over VPT, achieving up to 10%-30% accuracy gains after MAE pretraining and outperforming full fine-tuning in 19 out of 24 cases while using less than 0.4% of learnable parameters. SPT is also robust to prompt length and scales well with model capacity and training data size. The paper includes detailed experimental results and ablation studies to support the effectiveness of SPT.The paper "Revisiting the Power of Prompt for Visual Tuning" by Yuzhu Wang et al. addresses the challenges of visual prompt tuning (VPT) and its variants, particularly in terms of prompt initialization, prompt length, and performance under self-supervised pretraining. The authors propose a new approach called Self-Prompt Tuning (SPT) that initializes prompts with downstream token prototypes, which significantly improves performance. SPT is designed to overcome the limitations of existing methods, such as VPT and GateVPT, by leveraging high mutual information between prompt tokens and patch tokens during training. The proposed method is evaluated on various benchmarks and shows substantial improvements over VPT, achieving up to 10%-30% accuracy gains after MAE pretraining and outperforming full fine-tuning in 19 out of 24 cases while using less than 0.4% of learnable parameters. SPT is also robust to prompt length and scales well with model capacity and training data size. The paper includes detailed experimental results and ablation studies to support the effectiveness of SPT.
Reach us at info@study.space
[slides] Revisiting the Power of Prompt for Visual Tuning | StudySpace