31 Mar 2024 | Hee Suk Yoon, Eunseop Yoon, Joshua Tian Jin Tee, Mark Hasegawa-Johnson, Yingzhen Li, Chang D. Yoo
This paper addresses the critical but under-explored challenge of calibration in test-time prompt tuning for large-scale vision-language models like CLIP. Traditional calibration methods rely on substantial labeled data, making them impractical for test-time scenarios. The authors introduce Calibrated Test-time Prompt Tuning (C-TPT), which leverages the inherent properties of CLIP to optimize prompts during test-time, enhancing calibration without requiring labeled data. Through extensive experiments on various datasets and CLIP models, the paper demonstrates that C-TPT effectively improves the calibration of test-time prompt tuning, achieving better-calibrated predictions while maintaining or improving accuracy. The key contributions include the introduction of Average Text Feature Dispersion (ATFD) and its strong negative correlation with Expected Calibration Error (ECE), and the proposed C-TPT method that jointly optimizes the prompt to maximize ATFD. The code for C-TPT is publicly available at https://github.com/hee-suk-yoon/C-TPT.This paper addresses the critical but under-explored challenge of calibration in test-time prompt tuning for large-scale vision-language models like CLIP. Traditional calibration methods rely on substantial labeled data, making them impractical for test-time scenarios. The authors introduce Calibrated Test-time Prompt Tuning (C-TPT), which leverages the inherent properties of CLIP to optimize prompts during test-time, enhancing calibration without requiring labeled data. Through extensive experiments on various datasets and CLIP models, the paper demonstrates that C-TPT effectively improves the calibration of test-time prompt tuning, achieving better-calibrated predictions while maintaining or improving accuracy. The key contributions include the introduction of Average Text Feature Dispersion (ATFD) and its strong negative correlation with Expected Calibration Error (ECE), and the proposed C-TPT method that jointly optimizes the prompt to maximize ATFD. The code for C-TPT is publicly available at https://github.com/hee-suk-yoon/C-TPT.