Understanding Efficient Test-Time Adaptation of Vision-Language Models

The paper introduces TDA (Training-free Dynamic Adapter), a novel method for efficient and effective test-time adaptation of vision-language models. TDA leverages a lightweight key-value cache to store dynamic queues of few-shot pseudo labels and corresponding test-sample features, enabling gradual adaptation without backpropagation. The method includes two caches: a positive cache for high-confidence pseudo labels and a negative cache for identifying absent classes. Negative pseudo labeling is introduced to mitigate the impact of noisy pseudo labels. Extensive experiments on two benchmarks (OOD and cross-domain) demonstrate that TDA outperforms state-of-the-art methods in both accuracy and efficiency, significantly reducing testing time from hours to minutes. The code is available at <https://kiaaa.github.io/tda/>.The paper introduces TDA (Training-free Dynamic Adapter), a novel method for efficient and effective test-time adaptation of vision-language models. TDA leverages a lightweight key-value cache to store dynamic queues of few-shot pseudo labels and corresponding test-sample features, enabling gradual adaptation without backpropagation. The method includes two caches: a positive cache for high-confidence pseudo labels and a negative cache for identifying absent classes. Negative pseudo labeling is introduced to mitigate the impact of noisy pseudo labels. Extensive experiments on two benchmarks (OOD and cross-domain) demonstrate that TDA outperforms state-of-the-art methods in both accuracy and efficiency, significantly reducing testing time from hours to minutes. The code is available at <https://kiaaa.github.io/tda/>.

Efficient Test-Time Adaptation of Vision-Language Models

27 Mar 2024 | Adilbek Karmanov, Dayan Guan, Shijian Lu, Abdulmotaleb El Saddik, Eric Xing