21 May 2016 | Mingsheng Long, Jianmin Wang, Michael I. Jordan
This paper introduces a novel approach to deep transfer learning called Joint Adaptation Networks (JAN), which directly adapts joint distributions across domains. Unlike previous methods that separately adapt marginal and conditional distributions, JAN eliminates the need for such assumptions by directly comparing joint distributions using a novel joint distribution discrepancy (JDD). This discrepancy is computed by embedding joint distributions into reproducing kernel Hilbert spaces, allowing for efficient training via back-propagation. JAN is implemented in deep convolutional networks, where dataset shifts may occur in multiple task-specific feature layers and the classifier layer. By minimizing JDD, JAN matches the joint distributions of these layers across domains, enabling effective domain adaptation. Experiments on standard domain adaptation datasets show that JAN achieves state-of-the-art results, outperforming existing methods on most transfer tasks. JAN's effectiveness is demonstrated through its ability to adapt joint distributions, leading to improved classification accuracy, especially on challenging transfer tasks. The paper also addresses the limitations of previous methods, such as the inability to handle joint distribution shifts, and proposes a principled approach to model multi-layer features in a kernel embedding framework. The results highlight the importance of joint distribution adaptation in deep neural networks for effective domain adaptation.This paper introduces a novel approach to deep transfer learning called Joint Adaptation Networks (JAN), which directly adapts joint distributions across domains. Unlike previous methods that separately adapt marginal and conditional distributions, JAN eliminates the need for such assumptions by directly comparing joint distributions using a novel joint distribution discrepancy (JDD). This discrepancy is computed by embedding joint distributions into reproducing kernel Hilbert spaces, allowing for efficient training via back-propagation. JAN is implemented in deep convolutional networks, where dataset shifts may occur in multiple task-specific feature layers and the classifier layer. By minimizing JDD, JAN matches the joint distributions of these layers across domains, enabling effective domain adaptation. Experiments on standard domain adaptation datasets show that JAN achieves state-of-the-art results, outperforming existing methods on most transfer tasks. JAN's effectiveness is demonstrated through its ability to adapt joint distributions, leading to improved classification accuracy, especially on challenging transfer tasks. The paper also addresses the limitations of previous methods, such as the inability to handle joint distribution shifts, and proposes a principled approach to model multi-layer features in a kernel embedding framework. The results highlight the importance of joint distribution adaptation in deep neural networks for effective domain adaptation.