[slides and audio] Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization

This paper addresses the issue of overfitting in vision-language models during fine-tuning, particularly when these models are used for out-of-distribution (OOD) generalization. The authors propose a novel method called OGEN (Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization) to improve the OOD generalization performance of fine-tuned models. OGEN introduces a class-conditional feature generator that synthesizes OOD features using only the class name of unknown classes, which helps in regularizing the decision boundary between known and unknown classes. Additionally, OGEN employs an adaptive self-distillation mechanism to further reduce overfitting during joint optimization. The method is evaluated on various datasets and shows significant improvements in OOD generalization performance, both within-dataset and cross-dataset settings. The authors also provide a comprehensive study on the pitfalls of existing finetuning methods and demonstrate the effectiveness of their approach through extensive experiments.This paper addresses the issue of overfitting in vision-language models during fine-tuning, particularly when these models are used for out-of-distribution (OOD) generalization. The authors propose a novel method called OGEN (Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization) to improve the OOD generalization performance of fine-tuned models. OGEN introduces a class-conditional feature generator that synthesizes OOD features using only the class name of unknown classes, which helps in regularizing the decision boundary between known and unknown classes. Additionally, OGEN employs an adaptive self-distillation mechanism to further reduce overfitting during joint optimization. The method is evaluated on various datasets and shows significant improvements in OOD generalization performance, both within-dataset and cross-dataset settings. The authors also provide a comprehensive study on the pitfalls of existing finetuning methods and demonstrate the effectiveness of their approach through extensive experiments.

OVERCOMING THE PITFALLS OF VISION-LANGUAGE MODEL FINETUNING FOR OOD GENERALIZATION

16 Apr 2024 | Yuhang Zang, Hanlin Goh, Josh Susskind, Chen Huang

OVERCOMING THE PITFALLS OF VISION-LANGUAGE MODEL FINETUNING FOR OOD GENERALIZATION

16 Apr 2024 | Yuhang Zang*, Hanlin Goh*, Josh Susskind, Chen Huang

16 Apr 2024 | Yuhang Zang, Hanlin Goh, Josh Susskind, Chen Huang