The paper "Cross-stitch Networks for Multi-task Learning" by Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and Martial Hebert from Carnegie Mellon University introduces a novel approach to multi-task learning in Convolutional Neural Networks (CNNs). The authors propose a new unit called "cross-stitch" that combines the activations from multiple networks, allowing for the end-to-end training of shared representations across different tasks. This method generalizes across multiple tasks and significantly improves performance, especially for categories with few training examples.
The paper begins by discussing the success of multi-task learning in CNNs, highlighting the importance of learning shared representations from multiple supervisory tasks. However, existing approaches often rely on specific network architectures for each task, which do not generalize well. To address this, the authors propose cross-stitch units, which can learn an optimal combination of shared and task-specific representations.
The introduction section explains the benefits of multi-task learning and the challenges in selecting appropriate network architectures. The authors conduct extensive empirical studies to understand the trade-offs between shared and task-specific representations, finding that the best architecture depends on the specific tasks and data.
The paper then delves into the design of cross-stitch units, explaining how they model shared representations using linear combinations of activation maps. The units are integrated into a ConvNet, and detailed ablation studies are performed to understand their training procedure. The authors also discuss design decisions such as initialization and learning rates for the cross-stitch units.
Experiments are conducted on two pairs of tasks: semantic segmentation and surface normal prediction on the NYU-v2 dataset, and object detection and attribute prediction on the PASCAL VOC 2008 dataset. The results show that the cross-stitch network outperforms baseline methods, particularly for data-starved categories. The authors conclude by discussing the potential future directions for studying the properties of cross-stitch units, such as their placement in the network and weight constraints.The paper "Cross-stitch Networks for Multi-task Learning" by Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and Martial Hebert from Carnegie Mellon University introduces a novel approach to multi-task learning in Convolutional Neural Networks (CNNs). The authors propose a new unit called "cross-stitch" that combines the activations from multiple networks, allowing for the end-to-end training of shared representations across different tasks. This method generalizes across multiple tasks and significantly improves performance, especially for categories with few training examples.
The paper begins by discussing the success of multi-task learning in CNNs, highlighting the importance of learning shared representations from multiple supervisory tasks. However, existing approaches often rely on specific network architectures for each task, which do not generalize well. To address this, the authors propose cross-stitch units, which can learn an optimal combination of shared and task-specific representations.
The introduction section explains the benefits of multi-task learning and the challenges in selecting appropriate network architectures. The authors conduct extensive empirical studies to understand the trade-offs between shared and task-specific representations, finding that the best architecture depends on the specific tasks and data.
The paper then delves into the design of cross-stitch units, explaining how they model shared representations using linear combinations of activation maps. The units are integrated into a ConvNet, and detailed ablation studies are performed to understand their training procedure. The authors also discuss design decisions such as initialization and learning rates for the cross-stitch units.
Experiments are conducted on two pairs of tasks: semantic segmentation and surface normal prediction on the NYU-v2 dataset, and object detection and attribute prediction on the PASCAL VOC 2008 dataset. The results show that the cross-stitch network outperforms baseline methods, particularly for data-starved categories. The authors conclude by discussing the potential future directions for studying the properties of cross-stitch units, such as their placement in the network and weight constraints.