LanDA: Language-Guided Multi-Source Domain Adaptation

LanDA: Language-Guided Multi-Source Domain Adaptation

25 Jan 2024 | Zhenbin Wang, Lei Zhang*, Lituan Wang, Minjuan Zhu
LanDA is a language-guided multi-source domain adaptation (MSDA) method that leverages visual-language foundational models (VLFMs) to adapt multiple source domains to an unseen target domain without requiring any target domain images. The method uses a joint image and language embedding space to align source domain images with the target domain based on textual descriptions. LanDA introduces a novel approach by combining domain-class alignment loss and distribution consistency loss to ensure that the extended domains retain domain-invariant features while eliminating class-irrelevant information. The method also incorporates a cost function tailored for VLFMs to account for text embeddings and improve domain adaptation performance. The proposed approach is evaluated on various benchmarks, demonstrating superior performance compared to standard fine-tuning and ensemble methods in both target and source domains. The framework consists of two stages: first, training domain-specific augmenters to align source domain images with the target domain in the VLFM embedding space; second, projecting extended domains and class-specific text embeddings into the Wasserstein space to extract domain-invariant information and train a linear classifier. Experimental results show that LanDA achieves strong performance in both the target and source domains, highlighting its effectiveness in language-guided domain adaptation.LanDA is a language-guided multi-source domain adaptation (MSDA) method that leverages visual-language foundational models (VLFMs) to adapt multiple source domains to an unseen target domain without requiring any target domain images. The method uses a joint image and language embedding space to align source domain images with the target domain based on textual descriptions. LanDA introduces a novel approach by combining domain-class alignment loss and distribution consistency loss to ensure that the extended domains retain domain-invariant features while eliminating class-irrelevant information. The method also incorporates a cost function tailored for VLFMs to account for text embeddings and improve domain adaptation performance. The proposed approach is evaluated on various benchmarks, demonstrating superior performance compared to standard fine-tuning and ensemble methods in both target and source domains. The framework consists of two stages: first, training domain-specific augmenters to align source domain images with the target domain in the VLFM embedding space; second, projecting extended domains and class-specific text embeddings into the Wasserstein space to extract domain-invariant information and train a linear classifier. Experimental results show that LanDA achieves strong performance in both the target and source domains, highlighting its effectiveness in language-guided domain adaptation.
Reach us at info@study.space
[slides and audio] LanDA%3A Language-Guided Multi-Source Domain Adaptation