This paper introduces a novel method called Self-Prompt Tuning (SPT) to enhance visual prompt tuning. SPT addresses key challenges in visual prompt tuning (VPT), such as prompt initialization, prompt length, and performance under self-supervised pretraining. The core idea of SPT is to initialize prompts with token prototypes derived from downstream tasks, which significantly improves performance. The method is robust to prompt length and scales well with model capacity and training data size. SPT outperforms existing methods, achieving up to 10-30% higher accuracy than VPT after MAE pre-training and outperforming full fine-tuning in 19 out of 24 cases while using less than 0.4% of learnable parameters. Additionally, SPT is efficient and can be implemented with minimal computational cost. The method is also robust to prompt length and scales well with model size. The paper also explores the impact of different sampling strategies on prompt construction and demonstrates that SPT achieves significant improvements over VPT in various scenarios. The results show that SPT is a promising approach for parameter-efficient fine-tuning in visual tasks.This paper introduces a novel method called Self-Prompt Tuning (SPT) to enhance visual prompt tuning. SPT addresses key challenges in visual prompt tuning (VPT), such as prompt initialization, prompt length, and performance under self-supervised pretraining. The core idea of SPT is to initialize prompts with token prototypes derived from downstream tasks, which significantly improves performance. The method is robust to prompt length and scales well with model capacity and training data size. SPT outperforms existing methods, achieving up to 10-30% higher accuracy than VPT after MAE pre-training and outperforming full fine-tuning in 19 out of 24 cases while using less than 0.4% of learnable parameters. Additionally, SPT is efficient and can be implemented with minimal computational cost. The method is also robust to prompt length and scales well with model size. The paper also explores the impact of different sampling strategies on prompt construction and demonstrates that SPT achieves significant improvements over VPT in various scenarios. The results show that SPT is a promising approach for parameter-efficient fine-tuning in visual tasks.