Understanding Dual Memory Networks%3A A Versatile Adaptation Approach for Vision-Language Models

The paper introduces Dual Memory Networks (DMN), a versatile adaptation approach for vision-language models, capable of handling zero-shot, few-shot, and training-free few-shot adaptation tasks. DMN comprises dynamic and static memory components. The static memory caches training data knowledge, enabling training-free few-shot adaptation, while the dynamic memory preserves historical test features during testing, allowing exploration of additional data insights beyond the training set. This approach enhances model performance in few-shot settings and supports zero-shot adaptation without training data. The two memory networks use a flexible memory interactive strategy that can operate in a training-free mode and can be enhanced with learnable projection layers. The method is evaluated on 11 datasets, showing significant improvements over existing methods in zero-shot and few-shot settings, even outperforming those using external training data. DMN also demonstrates robust performance against natural distribution shifts. The paper highlights the importance of historical test samples in the adaptation process and provides a comprehensive analysis of the method's components and their impact on performance.The paper introduces Dual Memory Networks (DMN), a versatile adaptation approach for vision-language models, capable of handling zero-shot, few-shot, and training-free few-shot adaptation tasks. DMN comprises dynamic and static memory components. The static memory caches training data knowledge, enabling training-free few-shot adaptation, while the dynamic memory preserves historical test features during testing, allowing exploration of additional data insights beyond the training set. This approach enhances model performance in few-shot settings and supports zero-shot adaptation without training data. The two memory networks use a flexible memory interactive strategy that can operate in a training-free mode and can be enhanced with learnable projection layers. The method is evaluated on 11 datasets, showing significant improvements over existing methods in zero-shot and few-shot settings, even outperforming those using external training data. DMN also demonstrates robust performance against natural distribution shifts. The paper highlights the importance of historical test samples in the adaptation process and provides a comprehensive analysis of the method's components and their impact on performance.

Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models

26 Mar 2024 | Yabin Zhang, Wenjie Zhu, Hui Tang, Zhiyuan Ma, Kaiyang Zhou, Lei Zhang