C-TPT: CALIBRATED TEST-TIME PROMPT TUNING FOR VISION-LANGUAGE MODELS VIA TEXT FEATURE DISPERSION

C-TPT: CALIBRATED TEST-TIME PROMPT TUNING FOR VISION-LANGUAGE MODELS VIA TEXT FEATURE DISPERSION

2024 | Hee Suk Yoon, Eunseop Yoon, Joshua Tian Jin Tee, Mark Hasegawa-Johnson, Yingzhen Li, Chang D. Yoo
This paper introduces Calibrated Test-Time Prompt Tuning (C-TPT), a method for improving the calibration of predictions in vision-language models (VLMs) like CLIP during test-time prompt tuning. Traditional calibration methods rely on labeled data, which is not available during test-time. C-TPT leverages the inherent properties of CLIP to optimize prompts for better calibration without requiring labeled data. The key insight is that prompts leading to higher text feature dispersion result in better-calibrated predictions. The paper defines the Average Text Feature Dispersion (ATFD) and shows its strong negative correlation with Expected Calibration Error (ECE). C-TPT jointly optimizes prompts during test-time to maximize ATFD, thereby improving calibration. Experiments on various CLIP architectures and datasets demonstrate that C-TPT effectively enhances calibration without needing labeled data. The code is publicly available at https://github.com/hee-suk-yoon/C-TPT.This paper introduces Calibrated Test-Time Prompt Tuning (C-TPT), a method for improving the calibration of predictions in vision-language models (VLMs) like CLIP during test-time prompt tuning. Traditional calibration methods rely on labeled data, which is not available during test-time. C-TPT leverages the inherent properties of CLIP to optimize prompts for better calibration without requiring labeled data. The key insight is that prompts leading to higher text feature dispersion result in better-calibrated predictions. The paper defines the Average Text Feature Dispersion (ATFD) and shows its strong negative correlation with Expected Calibration Error (ECE). C-TPT jointly optimizes prompts during test-time to maximize ATFD, thereby improving calibration. Experiments on various CLIP architectures and datasets demonstrate that C-TPT effectively enhances calibration without needing labeled data. The code is publicly available at https://github.com/hee-suk-yoon/C-TPT.
Reach us at info@study.space
Understanding C-TPT%3A Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion