Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

6 Oct 2022 | Kaiyang Zhou Jingkang Yang Chen Change Loy Ziwei Liu
This paper introduces Conditional Context Optimization (CoCoOp), a method to improve the generalizability of prompts in vision-language models (VLMs). The authors address the issue that the existing method, Context Optimization (CoOp), overfits to base classes and fails to generalize to new, unseen classes. CoCoOp extends CoOp by adding a lightweight neural network, called Meta-Net, that generates input-conditional tokens for each image. These dynamic prompts adapt to each instance, making them less sensitive to class shift and improving generalization. Experiments on 11 datasets show that CoCoOp outperforms CoOp in generalization to unseen classes and achieves better domain generalization. CoCoOp also performs well in cross-dataset transfer and domain generalization tasks. The method is efficient and parameter-efficient, with a simple design that allows for effective adaptation of pre-trained VLMs to downstream tasks. The results demonstrate that instance-conditional prompts are more transferable and generalizable than static prompts. The paper also discusses limitations, such as training efficiency and the need for further research to close the gap between manual and learning-based prompts. Overall, the study provides valuable insights into the generalizability of prompt learning and highlights the effectiveness of conditional prompt learning in various problem scenarios.This paper introduces Conditional Context Optimization (CoCoOp), a method to improve the generalizability of prompts in vision-language models (VLMs). The authors address the issue that the existing method, Context Optimization (CoOp), overfits to base classes and fails to generalize to new, unseen classes. CoCoOp extends CoOp by adding a lightweight neural network, called Meta-Net, that generates input-conditional tokens for each image. These dynamic prompts adapt to each instance, making them less sensitive to class shift and improving generalization. Experiments on 11 datasets show that CoCoOp outperforms CoOp in generalization to unseen classes and achieves better domain generalization. CoCoOp also performs well in cross-dataset transfer and domain generalization tasks. The method is efficient and parameter-efficient, with a simple design that allows for effective adaptation of pre-trained VLMs to downstream tasks. The results demonstrate that instance-conditional prompts are more transferable and generalizable than static prompts. The paper also discusses limitations, such as training efficiency and the need for further research to close the gap between manual and learning-based prompts. Overall, the study provides valuable insights into the generalizability of prompt learning and highlights the effectiveness of conditional prompt learning in various problem scenarios.
Reach us at info@study.space
[slides and audio] Conditional Prompt Learning for Vision-Language Models