[slides] Learning multiple visual domains with residual adapters

This paper addresses the challenge of learning a single visual representation that can effectively handle a wide range of image domains, from dog breeds to stop signs and digits. Inspired by recent work on learning networks that predict parameters of another network, the authors develop a tunable deep network architecture with adapter residual modules. These modules enable the network to be dynamically adjusted to different visual domains while maintaining or improving domain-specific representation accuracy. The method achieves high parameter sharing and avoids forgetting by learning from diverse domains. The authors also introduce the *Visual Decathlon Challenge*, a benchmark that evaluates the ability of representations to capture ten different visual domains simultaneously and measures their performance across all tasks. The challenge consists of ten representative datasets, including ImageNet, SVHN, action classification, and texture recognition. The evaluation metric rewards models that perform well on all tasks, emphasizing their consistency rather than average accuracy. Experimental results show that the proposed method outperforms existing approaches, achieving superior performance with significantly fewer parameters.This paper addresses the challenge of learning a single visual representation that can effectively handle a wide range of image domains, from dog breeds to stop signs and digits. Inspired by recent work on learning networks that predict parameters of another network, the authors develop a tunable deep network architecture with adapter residual modules. These modules enable the network to be dynamically adjusted to different visual domains while maintaining or improving domain-specific representation accuracy. The method achieves high parameter sharing and avoids forgetting by learning from diverse domains. The authors also introduce the *Visual Decathlon Challenge*, a benchmark that evaluates the ability of representations to capture ten different visual domains simultaneously and measures their performance across all tasks. The challenge consists of ten representative datasets, including ImageNet, SVHN, action classification, and texture recognition. The evaluation metric rewards models that perform well on all tasks, emphasizing their consistency rather than average accuracy. Experimental results show that the proposed method outperforms existing approaches, achieving superior performance with significantly fewer parameters.

Learning multiple visual domains with residual adapters

27 Nov 2017 | Sylvestre-Alvise Rebuffi, Hakan Bilen, Andrea Vedaldi