25 Jan 2024 | Zhenbin Wang, Lei Zhang, Lituan Wang, Minjuan Zhu
**LanDA: Language-Guided Multi-Source Domain Adaptation**
**Authors:** Zhenbin Wang
**Abstract:**
Multi-Source Domain Adaptation (MSDA) aims to mitigate changes in data distribution when transferring knowledge from multiple labeled source domains to an unlabeled target domain. However, existing MSDA techniques assume the availability of target domain images, which can be challenging to obtain. This paper proposes a novel approach called LanDA, which leverages a multimodal model with a joint image and language embedding space to guide MSDA using only textual cues. LanDA is based on optimal transfer theory and requires only textual descriptions of the target domain, without any target domain images. The method involves training domain-specific augmenters to align source domain images with the target domain, followed by a linear classifier trained on the extended domains. Extensive experiments across various benchmarks demonstrate that LanDA outperforms standard fine-tuning and ensemble approaches in both target and source domains.
**Introduction:**
Domain adaptation is crucial for generalizing models outside their training domain. MSDA addresses this by using multiple source domains to improve performance on unseen target domains. Traditional methods often rely on visual backbones and require substantial target domain image data. LanDA introduces a novel approach that uses Visual-Language Foundational Models (VLFMs) like CLIP to guide MSDA without target domain images. VLFMs align text and images within shared embedding spaces, making them suitable for domain adaptation tasks.
**Related Work:**
- **MSDA:** Traditional methods use statistical discrepancy or cross-domain feature extraction to address MSDA. LanDA focuses on using VLFMs and domain-specific augmenters.
- **VLFMs:** Pre-trained models like CLIP bridge the gap between images and text, but their effectiveness for domain adaptation is underexplored.
- **Optimal Transport:** The Wasserstein distance is used to compare probability distributions, providing a principled way to align distributions.
**Language-Guided MSDA:**
LanDA consists of two stages:
1. **Training Domain-Specific Augmenters:** Domain-specific augmenters align source domain images with the target domain using domain-class alignment and distribution consistency losses.
2. **Training Linear Classifier:** A linear classifier is trained on the extended domains to classify unseen target domain images.
**Experiments:**
LanDA is evaluated on multiple benchmarks, showing superior performance compared to baselines that use image generation or fine-tuning methods. Qualitative evaluations and ablation studies further validate the effectiveness of LanDA's components.
**Conclusion:**
LanDA is a novel language-guided MSDA method that leverages VLFMs to achieve domain adaptation without target domain images. It demonstrates strong performance in both target and source domains, making it a promising approach for multi-source domain adaptation tasks.**LanDA: Language-Guided Multi-Source Domain Adaptation**
**Authors:** Zhenbin Wang
**Abstract:**
Multi-Source Domain Adaptation (MSDA) aims to mitigate changes in data distribution when transferring knowledge from multiple labeled source domains to an unlabeled target domain. However, existing MSDA techniques assume the availability of target domain images, which can be challenging to obtain. This paper proposes a novel approach called LanDA, which leverages a multimodal model with a joint image and language embedding space to guide MSDA using only textual cues. LanDA is based on optimal transfer theory and requires only textual descriptions of the target domain, without any target domain images. The method involves training domain-specific augmenters to align source domain images with the target domain, followed by a linear classifier trained on the extended domains. Extensive experiments across various benchmarks demonstrate that LanDA outperforms standard fine-tuning and ensemble approaches in both target and source domains.
**Introduction:**
Domain adaptation is crucial for generalizing models outside their training domain. MSDA addresses this by using multiple source domains to improve performance on unseen target domains. Traditional methods often rely on visual backbones and require substantial target domain image data. LanDA introduces a novel approach that uses Visual-Language Foundational Models (VLFMs) like CLIP to guide MSDA without target domain images. VLFMs align text and images within shared embedding spaces, making them suitable for domain adaptation tasks.
**Related Work:**
- **MSDA:** Traditional methods use statistical discrepancy or cross-domain feature extraction to address MSDA. LanDA focuses on using VLFMs and domain-specific augmenters.
- **VLFMs:** Pre-trained models like CLIP bridge the gap between images and text, but their effectiveness for domain adaptation is underexplored.
- **Optimal Transport:** The Wasserstein distance is used to compare probability distributions, providing a principled way to align distributions.
**Language-Guided MSDA:**
LanDA consists of two stages:
1. **Training Domain-Specific Augmenters:** Domain-specific augmenters align source domain images with the target domain using domain-class alignment and distribution consistency losses.
2. **Training Linear Classifier:** A linear classifier is trained on the extended domains to classify unseen target domain images.
**Experiments:**
LanDA is evaluated on multiple benchmarks, showing superior performance compared to baselines that use image generation or fine-tuning methods. Qualitative evaluations and ablation studies further validate the effectiveness of LanDA's components.
**Conclusion:**
LanDA is a novel language-guided MSDA method that leverages VLFMs to achieve domain adaptation without target domain images. It demonstrates strong performance in both target and source domains, making it a promising approach for multi-source domain adaptation tasks.